How to test research data using parametric or nonparametric data

In simple terms, the parametric data analysis procedures rely on being fed with data about which the underlying parameters of their distribution is known; that is typically, data that are normally distributed (the normal distribution gives that bell shape on a histogram). This generally makes the parametric procedures more sensitive, so people usually would prefer to apply these if possible.

Nonparametric procedures don’t care about the underlying data distribution and are more robust. However, we pay for this robustness with increased insensitivity. Nonparametric procedures are generally less sensitive so there is an increased chance of missing a significant effect when using the rough and ready nonparametric tests. The chance of detecting a significant effect that really does exist is called the statistical power of the experiment. Researchers would like for this to be as high as possible, 80% or more is good.

When should we not use the parametric tests in favor of the less sensitive nonparametric equivalents? Usually, we would drop to nonparametric tests if the data we are analyzing is significantly different from a normally distributed data set; this might be due to the distribution or the presence of outliers. This would be even more appropriate if the sample size is quite small (e.g. below 15 or 20) since one outlier in 15 data points will have a greater effect than one outlier in 1500 data points. Scores would typically be treated as nonparametric as would ordinal and nominal data.

What are the penalties of getting this wrong?

If you use a parametric test on nonparametric data then this could trick the test into seeing a significant effect when there isn’t one. This is very dangerous, proper statisticians call this a “type one error”. A type one error is a false-positive result. If you use a nonparametric test on parametric data then this could reduce the chance of seeing a significant effect when there is one. This is not ideal, proper statisticians call this a “type two error”. A type two error is a missed opportunity, i.e. we have failed to detect a significant effect that truly does exist. Of these two errors which are least dangerous? I feel that the type two error is least dangerous. Think of your research question as being a crossroads in knowledge. You are sitting in your car at a fork in the road, should you go left or right? A type one error would be to go down the wrong road; you would be actively going in the wrong direction. A type two error would be to sit there not knowing which way is correct, eventually, another researcher will come along and hopefully have a map.

So to summarize; Using a parametric test in the wrong context may lead to a type one error, a false positive. Using a nonparametric test in the wrong context may lead to a type two error, a missed opportunity (We will address this again when talking about interpreting p-values).

Before choosing a statistical test to apply to your data you should address the issue of whether your data are parametric or not. This is quite a subtle and convoluted decision but the guideline here should help start you thinking, remember the important rule is not to make unsupported assumptions about the data; don’t just assume the data are parametric.

You can use academic precedence to share the blame. For example, “Bloggs et. al. 2001 used a t-test so I will”. You might test the data for normality, or you might decide that given a small sample it is sensible to opt for nonparametric methods to avoid making assumptions.

  • Ranks, scores, or categories are generally non-parametric data.
  • Measurements that come from a population that is normally distributed can usually be treated as parametric.

Note: If in doubt treat your data as non-parametric especially if you have a relatively small sample.

Key Points

  • Generally speaking, parametric data are assumed to be normally distributed – the normal distribution (approximated mathematically by the Gaussian distribution) is a data distribution with more data values near the mean, and gradually less far away, symmetrically. A lot of biological data fit this pattern closely. To sensibly justify applying parametric tests the data should be normally distributed.
  • If you are unsure about the distribution of the data in a target population then it is safest to assume the data are non–parametric. The cost of this is that the non-parametric tests are generally less sensitive and so you would stand a greater chance of missing a small effect that does exist.
  • Tests that depend on an assumption about the distribution of the underlying population data, (e.g. t-tests) are parametric because they assume that the data being tested comes from a normally distributed population (i.e. a population we know the parameters of).
  • Tests that do not depend on many assumptions about the underlying distribution of the data are called non-parametric tests. These include the Wilcoxon signed-rank test, and the Mann-Whitney test and Spearman’s rank correlation coefficient. They are used widely to test small samples of ordinal data.
  • Are you looking for differences or correlation? You can look for differences whenever you have two sets of data. (It might not always be a sensible thing to do but you can do it!)
  • You can only look for correlation when you have a set of paired data, i.e. two sets of data where each data point in the first set has a partner in the second. If you aren’t sure about whether your data are paired, review the section on paired data.
  • You might, therefore, look for the differences in some attributes before and after some intervention.

Ponder these two questions:

  1. Does paracetamol lower temperature?
  2. Do the number of exercises performed, affect the amount of increase in muscle strength?

Which of these is about a difference and which is addressing correlation? – well, they aren’t all that well described but I reckon the first question is about seeing a difference and the second is about correlation, i.e. does the amount of exercise correlate with muscle strength? Whereas the first is about “does this drug make a difference?”

A variant of this is when conducting a reliability study. In many respects, the data structure is similar to a correlational experiment however the technique used to analyze the data is different.