A sample is biased when some members of the relevant population had a greater probability of being selected to be part of the sample than others. This results in a sample that is unrepresentative of the population as a whole, and thus any inferences made on the basis of this sample about the wider population are likely to be inaccurate. For example, if a survey is more likely to contact women than men even though the underlying population of interest has an equal number of men and women, then the resulting sample will be biased. Biased samples are very common, partly because it is often much easier to obtain a biased sample than an unbiased one. Some common factors contributing to the selection of biased samples include:
- Self-selection: in many surveys and case studies, subjects must agree to be interviewed, or have their case recorded, or voluntarily bring the information to the relevant recording body. In all such cases, it is likely that those who choose to be interviewed, share their story, make a report, etc, are systematically different in various (sometimes unpredictable) ways from those who choose not to. A particularly common example is that people with very strong views one way or the other are typically more likely to share an opinion about some issue than those who are uncertain or have more moderate views, leading to a false perception that most people are much more strongly opinionated about the issue than is actually the case. Because of this issue of selection bias, any statistics derived from self-reports must be interpreted with caution.
- Sampling biases: whenever a sample must be collected, some method for selecting the sample from the wider population must be devised. Many different sampling methods have been developed, but none are perfect in delivering a totally unbiased sample. Consider, for example, a telephone survey. Such a sampling method is much more likely to include those who are at home during the day when the survey is conducted, and cannot include those who do not have a telephone or do not answer their phone. If a survey speaks to people on the street, it will only include those people who frequent that particular location at that particular time, which is likely not to be representative of the broader population. Even more subtler biases may be present, for example interviewers being somewhat more likely to approach people on the street who seem friendly while avoiding those who seem less likely to English as well. All of these factors combine to result in samples that are biased relative to the population as a whole.
- Survivorship bias: in many situations, naively taking a sample from the observable population will result in a biased sample because those elements of the population that did not survive as long will be much less likely to be sampled. For example, if a survey of vacationers was conducted asking them how long they had been staying at the resort, the results would be biased upwards, since longer stayers are much more likely to be present when the survey is conducted (since they stay longer!), and thus more likely to be included in the sample. As another example, taking a survey of all lawn-mowing businesses listed in the phone directory and asking how much money they made in their first year of operations would result in a biased sample if the goal is to study newly established lawn-mowing businesses, since all those businesses which have already failed would not be included in the phone book, and hence would be excluded from the sample.
Biased sample fallacy: a simple introduction with examples from Logically Fallacious
Biased sampling and extrapolation: an explanation of why biased sampling is a problem, and how it can occur in practise
Bias in survey sampling: discusses many of the ways surveys can be based on biased samples
Sampling bias: discussion with some interesting historical examples