Epidemiology: Definition, Core Methods, and Applications in Global Health

In August 2014, health authorities in Guinea detected an unusual cluster of haemorrhagic fever cases in the south-eastern prefecture of Guéckédou. Within weeks, investigators from the Ministry of Health and the WHO had fanned out across villages, interviewing the sick and the bereaved, mapping contacts, tracing chains of transmission, and testing blood samples. What they uncovered - a zoonotic spillover of Ebola virus from a forest bat, amplified through funeral rites that brought mourners into direct contact with highly infectious bodies - was the opening chapter of the worst Ebola epidemic in recorded history. By the time the West African outbreak was declared over in 2016, more than 11,000 people had died across Guinea, Sierra Leone, and Liberia. The investigation that unravelled the source, the transmission dynamics, and the effective reproduction number was epidemiology at its operational core: a discipline built on the systematic study of how, where, and why disease occurs in populations.

That example is worth holding in mind as this article works through the epidemiology definition, its principal methods, the measures it uses to quantify disease in populations, and its applications to nutrition and infectious disease research in Sub-Saharan Africa. Epidemiology is both a science and a craft, and both aspects deserve careful treatment.

What Is Epidemiology? A Working Definition

The epidemiology definition most widely cited in the academic literature comes from John Last’s Dictionary of Epidemiology: the study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to the control of health problems. Three elements of that definition deserve unpacking.

Distribution refers to patterns - who gets sick, where, and when. Person, place, and time constitute the descriptive triad that anchors every epidemiological investigation. Characterising distribution requires careful attention to age, sex, socioeconomic position, ethnicity, occupation, and geography. A condition concentrated in children under five signals something very different from one spread uniformly across age groups.

Determinants refers to the causal forces at work - pathogens, nutritional exposures, environmental toxins, behaviours, and the structural conditions that shape all of these. Moving from description to determination requires analytical methods capable of separating signal from noise, and causes from mere correlates.

Specified populations marks the boundary between epidemiology and clinical medicine. The clinician is concerned with the individual patient; the epidemiologist is concerned with the group. Individual clinical findings feed into population-level inferences, but the unit of analysis is always the collective ( Rothman, Greenland, & Lash, 2008 ).

This distinction matters practically. A clinician seeing five children with severe acute malnutrition in a single week may note a worrying trend; the epidemiologist calculates whether that count exceeds the expected baseline, adjusts for reporting changes, maps the cases geographically, and asks whether something systematic has shifted - a failed harvest, a displacement event, a disrupted supply chain for therapeutic foods.

Descriptive and Analytical Epidemiology

The discipline divides broadly into two modes, each with distinct objectives.

Descriptive epidemiology characterises the occurrence of disease without testing causal hypotheses. It generates the maps, time-trend graphs, and demographic breakdowns that reveal where a problem exists, how large it is, and whether it is changing. For surveillance programmes across Sub-Saharan Africa - including the Health and Demographic Surveillance Systems (HDSS) that provide the empirical backbone of much regional research - descriptive output is the primary product. Without reliable description, intervention cannot be rationally targeted ( Streatfield et al., 2014 ).

Analytical epidemiology moves beyond description to test specific hypotheses about causes or risk factors. Does exclusive breastfeeding reduce the risk of diarrhoeal disease in infants under six months? Does household-level food insecurity predict stunting after adjustment for maternal education and asset wealth? These are analytical questions requiring comparison groups, measures of association, and strategies to control for confounding. The choice of study design shapes what questions can be answered and what threats to validity must be managed ( Bonita, Beaglehole, & Kjellström, 2006 ).

The two modes are not sequential - good surveillance systems generate hypotheses that analytical studies then test, and the findings of analytical studies inform what descriptive variables are worth collecting next.

Key Study Designs

Cross-Sectional Studies

In a cross-sectional study, both exposure and outcome are measured at a single point in time, or over a defined short period, in a defined population sample. The design is efficient and relatively cheap, making it the workhorse of nutritional surveillance in resource-constrained settings. National Demographic and Health Surveys, which have been fielded in most SSA countries since the 1980s, are cross-sectional in structure.

The principal analytic measure is prevalence, which cross-sectional data estimate directly. The key limitation is temporal ambiguity: when exposure and outcome are measured simultaneously, it is impossible to determine which came first. A cross-sectional study showing an association between dietary diversity and haemoglobin concentration cannot establish whether low dietary diversity caused anaemia, whether anaemia reduced appetite and thus dietary diversity, or whether both arose from a common antecedent.

Cohort Studies

Cohort designs follow a group of initially outcome-free individuals over time, classifying them by exposure status and observing who develops the outcome. The prospective cohort - in which exposure is measured before outcome occurs - is the most rigorous of the observational designs for assessing causality. Retrospective cohorts reconstruct past exposure from existing records. The key analytic measure is incidence or the relative risk (RR), which quantifies how much more likely the exposed group is to develop the outcome compared to the unexposed.

The COHORTS consortium, which pooled data from birth cohort studies in Brazil, Guatemala, India, Philippines, and South Africa, demonstrated how early-life linear growth faltering independently predicted adult outcomes including educational attainment, wages, and chronic disease risk - findings that required decades of follow-up and could not have been obtained from any cross-sectional design ( Victora et al., 2004 ). Cohort studies are discussed in depth in the companion article on cohort study design .

Case-Control Studies

Case-control studies recruit individuals who have already developed an outcome (cases) and compare them to those who have not (controls), looking back at prior exposures. The design is highly efficient for rare outcomes or those with long latency, since it does not require waiting for outcomes to develop in a prospective cohort. The analytic measure is the odds ratio (OR), which approximates the relative risk when the outcome is uncommon.

Selection of an appropriate control group is the central methodological challenge. Controls must come from the same source population as cases - that is, had they developed the outcome, they would have been eligible to be cases. Hospital controls are convenient but can introduce selection bias if hospitalisation is associated with the exposure under study. Community controls are more representative but harder to recruit ( Grimes & Schulz, 2002 ).

Randomised Controlled Trials

The randomised controlled trial (RCT) is the experimental design that provides the strongest evidence for causal inference, precisely because random allocation of exposure distributes known and unknown confounders equally across groups. In nutritional epidemiology, RCTs have established the causal effects of zinc supplementation on childhood diarrhoea incidence, vitamin A supplementation on child mortality, and iron-folic acid on maternal anaemia. The Lancet’s 2013 series on maternal and child nutrition, which synthesised evidence from dozens of trials and observational studies, estimated that scaling up ten proven nutrition interventions to 90% coverage could prevent approximately one million child deaths annually ( Black et al., 2013 ).

RCTs are not without limitations in public health contexts. Ethical constraints prevent randomising exposures known to be harmful. Many exposures of interest - dietary patterns, socioeconomic trajectories, household food security - cannot be randomised in any meaningful way. Trials of sufficient duration and scale to detect effects on long-latency outcomes such as stunting or cardiovascular disease are enormously expensive and operationally complex. Blinding is often impossible for dietary interventions. External validity - the generalisability of trial findings to real populations - depends heavily on how representative the trial population is and how faithfully the trial intervention mirrors what would actually be delivered at scale ( Bhutta et al., 2013 ).

Measures of Disease Frequency

Incidence

Incidence quantifies the rate at which new cases arise in a population over time. Two formulations exist:

Cumulative incidence (also called risk) is the proportion of an initially disease-free population that develops the outcome during a defined observation period. It requires complete follow-up of the entire population and is expressed as a dimensionless proportion.

Incidence rate (also called incidence density) accounts for variable follow-up by dividing the number of new cases by the total person-time at risk contributed by the study population. Its unit is cases per person-time (e.g., per 1,000 person-years). Person-time is the sum of each individual’s time under observation, from study entry until outcome, censoring, or end of follow-up, whichever comes first.

Prevalence

Prevalence is the proportion of a population that has a given condition at a specified moment (point prevalence) or during a defined period (period prevalence). Unlike incidence, it captures both new and existing cases. Prevalence is the appropriate measure for conditions that endure over time - chronic infections, nutritional deficiencies, disability - and is the measure generated by cross-sectional surveys. For a detailed treatment of how incidence and prevalence relate and when each is appropriate, see the companion article Incidence vs Prevalence .

Relative Risk and Odds Ratio

The relative risk compares the incidence of an outcome in an exposed group to its incidence in an unexposed group. A relative risk of 2.0 means the exposed group develops the outcome at twice the rate of the unexposed group. The odds ratio compares the odds of exposure among cases to the odds of exposure among controls (in case-control studies), or the odds of the outcome among exposed and unexposed (in cohort studies). When the outcome is rare (prevalence below ~10%), the odds ratio approximates the relative risk.

Both measures can be confounded - distorted by third variables associated with both exposure and outcome. Multivariable regression, stratification, and matching are the primary analytical tools for confounding control, though each carries its own assumptions ( Szklo & Nieto, 2014 ).

Bradford Hill Criteria for Causation

Establishing that an observed association reflects a causal relationship rather than confounding, bias, or chance requires evaluating evidence against criteria proposed by Austin Bradford Hill in his landmark 1965 address to the Royal Society of Medicine. The nine criteria - strength of association, consistency across studies, specificity, temporality, biological gradient, plausibility, coherence, experimental evidence, and analogy - do not constitute a checklist but a framework for reasoned judgement.

Temporality is the only criterion considered necessary (though not sufficient): a cause must precede its effect. This is why cross-sectional data alone can rarely establish causal direction. Strength and consistency elevate confidence, particularly when large effects replicate across diverse settings and study designs. Biological gradient - the dose-response relationship - is persuasive when present but its absence does not negate causation, since some causal processes operate through threshold effects ( Rothman, Greenland, & Lash, 2008 ).

Applied to nutritional epidemiology, the Bradford Hill framework has been used to evaluate claims ranging from the relationship between aflatoxin exposure and hepatocellular carcinoma in SSA populations to the causal role of dietary diversity in child stunting. In each case, no single study is sufficient; the synthesis of evidence types and settings is what builds the cumulative scientific case.

Applications in Sub-Saharan Africa

Sub-Saharan Africa presents epidemiology with both its most pressing problems and its most demanding operational environment. The burden of both undernutrition and infectious disease remains the highest in the world, yet civil registration systems are incomplete, health facility data are unreliable in many settings, and population denominators from census data are often outdated.

HDSS networks - of which more than 40 operate across SSA - have been critical in filling this gap. By continuously monitoring defined populations with rigorous demographic and health data collection, HDSS sites provide the incidence and mortality data that routine health systems cannot. The INDEPTH Network, which links HDSS sites across Africa and Asia, has published comparative analyses of cause-specific mortality, fertility transitions, and nutritional outcomes that have substantially shaped understanding of African health patterns ( Streatfield et al., 2014 ).

Nutritional epidemiology in SSA has benefited from large-scale efforts to standardise measurement tools - the WHO Anthro software for anthropometric Z-scores, the Dietary Diversity Score validated by FAO, and the Household Food Insecurity Access Scale - that allow cross-country comparison while preserving sensitivity to local food systems. The Lancet’s 2013 series on nutrition synthesised the epidemiological evidence base for interventions targeting the first 1,000 days, establishing that linear growth faltering begins in utero and that the window for effective intervention is both narrow and well-defined ( Bhutta et al., 2013 ).

The epidemiological transition - the shift from infectious to chronic non-communicable disease as the dominant cause of mortality - is underway in SSA but heterogeneous. Urban populations in South Africa, Kenya, and Ghana show NCD burden patterns increasingly comparable to high-income settings; rural populations in the Sahel and Central Africa remain overwhelmingly burdened by infectious disease and undernutrition. Epidemiology’s capacity to characterise both patterns simultaneously, and to track the demographic and dietary transitions driving the shift, is arguably its most important current contribution to regional public health.

For an account of how surveillance infrastructure has evolved to track these transitions, see The Evolution of Public Health Monitoring .

Limitations

Several structural limitations bear on the interpretation of epidemiological research, particularly in Sub-Saharan African contexts.

Measurement error is pervasive in nutritional epidemiology. Dietary recall methods are subject to respondent memory limitations, social desirability bias, and the challenge of capturing foods prepared in ways that make portion estimation difficult. Biomarker-based nutritional assessment is more objective but expensive and invasive at scale. When exposure is measured with error, associations are typically attenuated - the true effect size is likely larger than studies report.

Selection bias arises when the study population systematically differs from the population of interest. Cohort studies with high attrition, case-control studies using hospital controls, and cross-sectional surveys with non-random non-response all introduce selection bias. In settings with high child mortality, surviving cohort members may systematically differ from those lost to follow-up in ways that affect the outcome.

Confounding - the distortion of an association by a third variable related to both exposure and outcome - is the perennial threat in observational epidemiology. Socioeconomic position, for example, is associated with almost every dietary exposure and almost every health outcome studied in SSA contexts. Multivariable adjustment can only control for measured confounders; unmeasured confounding remains.

Generalisability of findings from HDSS sites or convenience samples to national populations requires care, since HDSS populations are often selected for logistical rather than representativeness reasons.

Frequently Asked Questions

What is the simplest way to understand the epidemiology definition? Epidemiology is the scientific study of who gets sick, where, when, and why - applied to populations rather than individuals. Its outputs inform both public health policy and clinical guidelines. The core methods involve counting cases, measuring exposures, designing studies that control for confounding, and applying the results to reduce disease burden.

What is the difference between descriptive and analytical epidemiology? Descriptive epidemiology characterises the distribution of disease in a population - its frequency, geographical spread, and demographic patterns - without testing causal hypotheses. Analytical epidemiology tests specific hypotheses about risk factors or causes, using comparative study designs such as cohort, case-control, or randomised trial. Both are necessary, and each informs the other.

Why is temporality the most important Bradford Hill criterion? Temporality - the requirement that the putative cause precede the effect - is the only criterion that cannot be compensated for by other evidence. Even an extremely strong, consistent, biologically plausible association cannot be accepted as causal if the exposure is measured after the outcome. This is why prospective cohort studies and RCTs, which ensure temporal ordering, carry more causal weight than cross-sectional data.

How do HDSS systems contribute to epidemiology in Sub-Saharan Africa? Health and Demographic Surveillance Systems provide continuous longitudinal data on defined populations, capturing births, deaths, migrations, and key health events in settings where vital registration is incomplete. This allows the calculation of incidence rates, cause-specific mortality, and population growth metrics that cannot be derived from cross-sectional surveys or facility records alone. HDSS data have underpinned landmark analyses of child mortality, nutritional outcomes, and the epidemiological transition across the continent.