Calculating the precise sample size is an important aspect of a research (especially in the medical field). It defines validity and applicability of the key findings of the study. Different studies follow different approach to calculate the sample size.
Defining sample size
Sample size is the number of observations drawn from a particular population. Sample size is represented as ‘n’ and is a positive integer.
For example, if you want to determine how a population of specific age group respond to social policy, then you can test the social policy on a small group of people of the target population and determine the response. The sample size here would be the group of people who were interviewed.
The selection of sample size depends on the statistical and non-statistical considerations. Non-statistical considerations include availability of sampling frame, resource, ethic and budget. Whereas the statistical considerations include precision of expected prevalence and estimate of prevalence.
Aspects to be considered while determining the sample size are:
- Population size – Since the proportion or percentage of overall population represented by the sample is to be determined, the total population size under study must be identified. The exact population size can be unknown and approximated between two numbers.
- Margin of error – Samples may not be perfect. Therefore, a small amount of error must be allowed. Margin of error, also known as confidence interval, identifies how greater or lesser than the population mean the sample mean can fall. The standard margin error is considered to be +/- 5%.
- Degree of variability – This refers to distribution of attributes in the population. Degree of variability varies depending on the attributes and target population. If the population is more heterogeneous, large sample size is required to achieve a greater level of precision. The moe homogenous is the population, the smaller the sample size. 50% of proportion represents the greater variability level.
- Confidence level – The confidence level, also known as risk level, states that when a population is sampled repeatedly, then the average value of attribute acquired by the samples is equal to the true population value.
- Standard deviation – It represents the amount of expected variances. The value of standard deviation generally used is 0.5. This value ensures that the sample is large enough.
Sample size can be determined using various approaches. These include applying formulas, using census, employing published tables and using sample size of a similar study. Let’s have a detailed look at each method.
- Applying formulas
This is one of the most commonly used methods in the determination of sample size. Generally, the formula for sample size is given by,
Sample size = (Z-score)2 * StdDev*(1-StdDev) / (margin of error)2
However, it is important to note that there is no universal formula to calculate the sample size. For example, the formula used to calculate sample size for cross sectional studies is given by,
Sample size = (Z1-ɑ/2 )2 p (1-p)/ d2 ( qualitative variable) and sample size = (Z1-ɑ/2 )2 SD2/ d2 (quantitative variable).
For case studies, sample size = (r+1/r) [(p*) (1-p*) (Zᵦ + Z ɑ/2 )]/ (p1-p2)2
- Using census
Census can be used for small population. This method eliminates the sampling error and offers data on all participants in the population. The major feature this method the cost associated with developing the questionnaire remains fixed, irrespective of number of samples. Here, virtually, the whole population must be sampled to obtain the desired precision level.
- Using published tables
Another method is using published tables that provide sample size for a given criteria. However, when using published tables, it must be presumed that the attributes under investigation are distributed normally. If this assumption if not met, then the entire population must be surveyed.
- Using sample size of a similar research
The fourth method is to use sample size from studies that are similar to you. However, prior to using the sample size, review the approaches used in that study to avoid the risk of repeating the same errors in your study. Review the literature in your field of study and identify the sample size used.
Immense care should be taken while determining the sample size. This is because if the calculated sample size is lower, then the confidence level will be lower and margin of error will be higher, making the data less reliable. Also, trying to achieve confidence level beyond 95% can be unrealistic.