Summary
Critical appraisal and evidencebased medicine involve the practical application of clinical epidemiology concepts in order to guide clinical decisionmaking. This requires an evaluation of the quality and applicability of existing research studies to individual clinical scenarios. Appropriate interpretation of the results of a research study in the right context requires a basic understanding of the following foundational concepts (found in the “Epidemiology” article): types of epidemiological studies (e.g., observational studies, experimental studies), common study designs (e.g., case series, cohort studies, casecontrol studies, randomized controlled trials), causal relationships in research studies, and other reasons for observed associations (e.g., random errors, systematic errors, confounding). This article focuses on an approach to critical appraisal, and epidemiological concepts often encountered in studies of clinical interventions, i.e., measures of association (e.g., relative risk, odds ratios, absolute risk reduction, number needed to treat), measures used to evaluate screening and diagnostic test (e.g., sensitivity, specificity, positive predictive value, negative predictive value), precision, and validity.
The following concepts are discussed separately: measures of disease frequency (e.g., incidence rates, prevalence) commonly used in studies of population health, foundational statistical concepts (e.g., measures of central tendency, measures of dispersion, normal distribution, confidence intervals), and guidance on conducting research projects.
See also “Epidemiology,” “Statistical analysis of data,” and “Population health.”
Evidence appraisal
Evidencebased medicine ^{[1]}
 Definition: The practice of medicine in which the physician uses clinical decisionmaking methods based on the best available current research from peerreviewed clinical and epidemiological studies with the aim of producing the most favorable outcome for the patient.

Application in clinical practice
 Define the patient's clinical problem (can be formulated as a PICO question).
 Search for sources of information about the clinical problem.
 Perform a critical appraisal of relevant research studies.
 Apply the information
 Before discussing the research findings with the patient, consider how and to which extent the researched options can improve patient care.
 Present comprehensive, but synthesized evidence to the patient using clear and understandable language.
 Engaged in shared decisionmaking, considering individual patient's risk profile and preferences.
Levels of evidence ^{[2]}^{[3]}
 Definition: a method used in evidencebased medicine to determine the strength of the findings from a clinical and/or epidemiological study
 Methods: Several different systems exist for assigning levels of evidence.
Levels of evidence ^{[3]}  

Level  Source of evidence  
I 
 
II  II.1 

II.2 
 
II.3 
 
III 

Grades of clinical recommendation ^{[4]}
A system developed by the US Preventive Task Force (USPSTF) to rate clinical evidence and create guidelines for clinical practice based on medical evidence. ^{[2]}
Grades of Recommendation ^{[4]}  

Grade  Net benefit  Level of certainty  Recommendation 
A 



B 



C 



D 



I 



Levels of certainty

Critical appraisal of research studies
Applications

Clinical practice (evidencebased medicine)
 Evaluation of the literature relevant to an individual patient's condition
 Review of updated guidelines on diagnosis and management of medical conditions
 Clinical decisionmaking

Research and academia
 Gathering background information for a research study
 Serving as a reviewer for a medical journal
 Participation in a journal club
Procedure
Perform an overall assessment and an indepth analysis of the different study sections. ^{[5]}^{[6]}
Questions to ask when critically appraising a research paper ^{[7]}  

Relevant questions to address  
Overall assessment 

Title/abstract 

Introduction 

Methods 

Results 

Discussion 

Other 

Reporting guidelines are available for different study types, e.g., CONSORT for randomized trials, STROBE for observational studies, and PRISMA for systematic reviews.
Measures of association
Measures of association can be used to quantify the strength of a relationship between two variables. See also “Measures of disease frequency.”
Twobytwo table
The degree of association between exposure and disease is typically evaluated using a twobytwo table, which compares the presence/absence of disease with the history of exposure to a risk factor.
Twobytwo table  

Disease (outcome)  No disease (no outcome)  Total  
Exposure (risk factor)  a  b  a + b 
No exposure (no risk factor)  c  d  c + d 
Total  a + c  b + d  a + b + c+ d 
Risk
 Risk factor: a variable or attribute that increases the probability of developing a disease or injury ^{[8]}

Absolute risk: the likelihood of an event occurring under specific conditions ^{[2]}
 Commonly expressed as a percentage
 Equal to the cumulative incidence, which can be calculated as follows: incidence rate × the time of followup
 Aim: to measure the probability of an individual in a study population developing an outcome
 Used in: cohort studies
 Formula: (number of new cases)/(total individuals in a study group) = (a + c)/(a + b + c + d)
 Relative risk: See “Estimates of association strength.”
 Attributable risk: See “Estimates of population impact.”
Formulas of common measures of association
 Measures that help quantify the strength of association
 Relative risk (RR): (a/(a + b))/(c/(c + d))
 Odds ratio (OR): (a/c)/(b/d) = ad/bc
 Measures that help quantify the impact of an association on a population
 Attributable risk (AR): a/(a + b)  c/(c + d)
 Absolute risk reduction (ARR): c/(c + d) – a/(a + b)
 Relative risk reduction (RRR): 1  RR
 Number needed to treat (NNT): 1/ARR
 Number needed to harm (NNH): 1/AR
Estimates of association strength
Relative risk (RR; risk ratio) ^{[2]}^{[9]}
 Description: : the likelihood of an outcome in one group exposed to a potential risk factor compared to the risk in another group that has not been exposed

Purpose
 To measure how strongly a risk factor is associated with an outcome (e.g., death, injury, disease)
 To help establish disease etiology
 Used in: : cohort studies
 Formula: (incidence of disease in exposed group)/(incidence of disease in unexposed group) = (a/(a + b))/(c/(c + d))

Interpretation
 RR = 1: Exposure neither increases nor decreases the risk of the defined outcome.
 RR > 1: Exposure increases the risk of the outcome.
 RR < 1: Exposure decreases the risk of the outcome.
Odds ratio (OR) ^{[10]}

Description
 Comparison of the odds of an event occurring in one group against the odds of an event occurring in another group
 Odds: the probability of an event occurring divided by the probability of this event not occurring
 Calculated using the twobytwo table
 Purpose: to measure the strength of an association between a risk factor and an outcome
 Used in: : casecontrol studies

Formula

Odds ratio of exposure: compares the odds of exposure among individuals with an outcome (e.g., disease) against the odds of exposure among individuals without an outcome
 Odds of exposure in individuals with disease (i.e., case group): (exposure in individuals with disease)/(no exposure in individuals with disease) = a/c
 Odds of exposure in individuals without disease (i.e., control group): (exposure in individuals without disease)/(no exposure in individuals without disease) = b/d
 Odds ratio: (odds of exposure in individuals with disease)/(odds of exposure in individuals without disease) = (a/c)/(b/d) = ad/bc = (a/b)/(c/d)

Odds ratio of exposure: compares the odds of exposure among individuals with an outcome (e.g., disease) against the odds of exposure among individuals without an outcome

Interpretation
 OR = 1: The outcome is equally likely in exposed and unexposed individuals.
 OR > 1: The outcome is more likely to occur in exposed individuals.
 OR < 1: The outcome is less likely to occur in exposed individuals.

Rare disease assumption
 Casecontrol studies do not track participants over time, so they cannot be used to calculate relative risk.
 However, the assumption can be made that if an outcome (e.g., disease prevalence) is rare, the incidence of that outcome is low and the OR is approximately the same as the RR.
Hazard ratio (HR)
 Description: : a measure of the effect of an intervention on an outcome at any given point in time during the study period ^{[11]}^{[12]}
 Purpose: to help determine how long it takes for an event to occur in individuals in the case group, compared to individuals in the control group
 Used in: survival analysis
 Formula: (observed number of events in exposed group / expected number of deaths in exposed group) at time (t) / (observed number of events in unexposed group/expected number of deaths in unexposed group) at time (t) ^{[12]}

Interpretation
 HR = 1: no relationship
 HR > 1: The outcome of interest is more likely to occur in exposed individuals.
 HR < 1: The outcome of interest is less likely to occur in exposed individuals.
The RR is the risk of an event occurring by the end of the study period (i.e., cumulative risk), while the HR is the risk of an event occurring at any point in time during the study period (i.e., instantaneous risk). ^{[12]}
The RR, OR, and HR are usually displayed with a corresponding pvalue. They are considered statistically significant if the pvalue is < 0.05.
Estimates of population impact
Attributable risk (AR) ^{[13]}
 Description: the absolute difference between the risk of an outcome occurring in exposed individuals and unexposed individuals
 Purpose: to measure the excess risk of an outcome that can be attributed to the exposure
 Used in: cohort studies

Formulas
 Exposure AR: (incidence risk in exposed group)  (incidence risk in unexposed group) = a/(a + b)  c/(c + d)
 Population AR: (incidence risk in the study population)  (incidence risk in the unexposed group) = (a + c)/(a + b + c + d)  c/(c + d)
Attributable risk percent (ARP) ^{[13]}
 Description: the proportion of disease incidence among exposed individuals that can be attributed to the risk factor
 Purpose: to determine the proportion of cases in the exposed population that can be attributed to the risk factor
 Used in: cohort studies and casecontrol studies

Formulas: (incidence risk among exposed)  (incidence risk among unexposed)/(incidence risk among unexposed) x 100
 ARP = (RR  1)/RR x 100
 The RR cannot be calculated for casecontrol studies, so the OR (an estimate of the RR) can be used to calculate the attributable risk: ARP = (OR–1)/OR x 100.
 Alternatively, ARP = AR/(incidence of disease in the exposed group) x 100 = (a/(a + b) – c/(c + d)) / (a/(a + b)) x 100
Relative risk reduction (RRR)
 Description: : the proportion of risk in the exposure group after an intervention compared to the risk in the nonexposure group
 Purpose: to determine how much the treatment reduces the risk of negative outcomes
 Used in: cohort studies and crosssectional studies

Formulas
 1  RR
 Alternatively, RRR = ((incidence risk in unexposed group)  (incidence risk in exposed group))/(incidence risk of disease in the unexposed group) = (c/(c + d) – a/(a + b)) / (c/(c + d));
 Example: RRR can be used to demonstrate vaccine effectiveness = (risk among unvaccinated – risk among vaccinated)/(risk among unvaccinated) × 100. ^{[9]}
Absolute risk reduction (ARR; risk difference)
 Description: : the difference between the risk in the exposure group after an intervention and the risk in the nonexposure group (e.g., risk of death)
 Purpose: to show the risk without treatment as well as the risk reduction associated with treatment
 Used in: cohort studies, crosssectional studies, and clinical trials
 Formula: : (absolute risk in the unexposed group)  (absolute risk in the exposed group) = c/(c + d) – a/(a + b)
Number needed to treat (NNT)

Description
 The number of individuals that must be treated, in a particular time period, for one person to benefit from treatment (i.e., to not develop the disease)
 Inversely related to the effectiveness of a treatment
 Purpose: to compare the effectiveness of different treatments
 Used in: clinical trials
 Formula: : 1/ARR
Number needed to harm (NNH)

Description
 The number of individuals who need to be exposed to a certain risk factor before one person develops an outcome
 Directly correlates to the safety of the exposure
 Purpose: to determine the potential harms of an intervention
 Used in: clinical trials
 Formula: : 1/AR
Number needed to screen (NNS)
 Description: the number of individuals who need to be screened in a particular time period in order to prevent one death or adverse event ^{[14]}
 Formula (same as NNT): 1/ARR
Evaluation of screening or diagnostic tests
Overview
 Before a diagnostic modality (e.g., laboratory study, imaging study, diagnostic criteria) can be used in clinical practice, it needs to be determined how well the modality can distinguish between individuals with the disease and individuals without the disease.
 A test is compared to the gold standard test using a twobytwo table.
 A twobytwo table can be used to calculate a test's sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).
Features of a twobytwo table summarizing screening or diagnostic test results  

Disease  No disease  Interpretation  
Positive test result 

 
Negative test result 

 
Interpretation 


Example 2 x 2 table of a diagnostic test ^{[15]}
Diagnostic test for tuberculosis (TB)  

Patients with TB  Patients without TB  Total  
Positive test result  800 (TP)  400 (FP)  1200 (TP + FP) 
Negative test result  200 (FN)  3600 (TN)  3800 (FN + TN) 
Total  1000 (TP + FN)  4000 (FP + TN)  5000 (TP + FP + FN + TN) 

Interpretation
 Sensitivity: TP/(TP + FN) = 800/(800 + 200) = 80%
 Specificity: TN/(FP + TN) = 3600/(400 + 3600) = 90%
 False positive rate: FP/(FP + TN) = 400/(400 + 3600) = 10%
 False negative rate: FN/(TP + FN) = 200/(800 + 200) = 20%
 PPV: TP/(TP + FP) = 800/(800 + 400) = 66.6 %
 NPV: TN/(FN + TN) = 3600/(200 + 3600) = 94.7%
Sensitivity and specificity ^{[16]}
 Every diagnostic test generally involves a tradeoff between sensitivity and specificity.
 Sensitivity and specificity are inversely proportional, meaning that as the sensitivity increases, the specificity decreases, and vice versa.
Overview of sensitivity and specificity of clinical tests  

Sensitivity (epidemiology) (true positive rate)  Specificity (epidemiology) (true negative rate)  
Description 


Features 


A highly sensitive test can rule out a disease if negative, and a highly specific test can rule in a disease if positive
Predictive values ^{[17]}
Pretest probability
 Description: the probability that a patient with a particular manifestation has a specific disease before the result of the diagnostic test is known

Features
 The pretest probability of a disease is reflected by its prevalence in a particular region.

NPVs and PPVs depend on the test subject's pretest probability of disease (unlike sensitivity and specificity).
 A higher pretest probability will decrease the NPV and increase the PPV.
 A lower pretest probability will increase the NPV and decrease the PPV.
Posttest probability
 Description: the probability that a patient has a particular disease after a diagnostic test is carried out

Features
 Combines disease prevalence and sensitivity and specificity of a test to quantify the likelihood of a patient having a disease
 Can be determined using formulas or nomograms
Positive predictive value (PPV)
 Description: the proportion of individuals who test positive for a disease that actually have that disease

Features
 The PPV increases with increasing prevalence of a disease in the population. ^{[16]}
 Directly correlates with pretest probability

Formula
 See the twobytwo table above.
 The probability that an individual who tested positive actually does not have the disease: 1  PPV
Negative predictive value (NPV)
 Description: : the proportion of individuals who test negative for a disease that are actually diseasefree

Features
 The NPV decreases with increasing prevalence of the disease.
 It inversely correlates with pretest probability.

Formula
 See the twobytwo table above.
 The probability that an individual who tested negative actually has the disease: 1  NPV
Unlike sensitivity and specificity, which are determined solely by the diagnostic test itself, predictive values are also influenced by disease prevalence.
Likelihood ratio

Description:
 A measure used to determine the utility of a diagnostic test in clinical practice
 Represents the probability of a test result in someone with the disease over the probability of the test result in someone without disease

Features
 The likelihood ratio is not influenced by disease prevalence.
 Likelihood ratios can be multiplied by the pretest probability of disease to calculate an estimation of posttest probability.

Interpretation: reflects how much more likely a disease is in a person with a given test result compared to their pretest probability
 A likelihood ratio > 1 is associated with the presence of a disease.
 A likelihood ratio < 1 is associated with absence of a disease.
 If the likelihood ratio is 1, the posttest probability is similar to the pretest probability, and therefore the test has poor clinical utility.

Types

Positive likelihood ratio
 Ratio of the sensitivity rate (true positive rate) to the false positive rate
 Sensitivity/(1  specificity) = (TP rate)/(FP rate)
 A positive likelihood ratio (> 10) is suggestive of a very specific test.

Negative likelihood ratio
 Ratio between the false negative rate and the specificity (true negative rate)
 (1  sensitivity)/specificity = (FN rate)/(TN rate)
 A negative likelihood ratio < 0.1 is suggestive of a very sensitive test.

Positive likelihood ratio
Cutoff values ^{[18]}

Definition: dividing points on measuring scales where the test results are divided into different categories
 Positive: has the condition of interest
 Negative: does not have the condition of interest
 Features: Sensitivity, specificity, PPVs, and NPVs vary according to the criterion and/or the cutoff values of the data.

Interpretation: What happens when a cutoff value is raised or lowered depends on whether the test in question requires a high value (e.g., tumor marker for cancer, lipase for pancreatitis) or a low value (e.g., hyponatremia, agranulocytosis).
 Lowering or raising a cutoff value for a high value test:

Lowering or raising a cutoff value for a low value test:
 Decreased cutoff value (i.e., narrowed inclusion criteria): higher specificity, lower sensitivity, higher PPV (decrease in false positives > decrease in true positives), lower NPV (increase in false negatives > increase in true negatives)
 Increased cutoff value (i.e., broadened inclusion criteria): lower specificity, higher sensitivity, lower PPV (increase in true positives > increase in false positives), higher NPV (decrease in false negatives > decrease in true negatives)
Receiver operating characteristic curve (ROC curve) ^{[15]}^{[19]}
 Description: a graph that compares the sensitivity and specificity of a diagnostic test

Features
 Shows the tradeoff between clinical sensitivity and specificity for every possible cutoff value, to evaluate the ability of the test to correctly diagnose subjects

The yaxis represents the sensitivity (i.e., true positive rate) and the xaxis corresponds to 1  specificity (i.e., the false positive rate).
 A test is considered more accurate the more closely the curve follows the yaxis.
 A test is considered less accurate if the curve is closer to the diagonal.
 The area under the ROC curve (AUROC) also allows the usefulness of tests to be compared: The larger the AUROC, the higher the accuracy of the test. ^{[20]}
Screening tests
 Used to identify disease in asymptomatic individuals (e.g., mammogram for breast cancer, Pap smear for cervical cancer)
 Should have a high sensitivity
Potential sources of bias in screening tests  

Leadtime bias  Lengthtime bias  
Description 


Example 


Solutions 


Confirmatory tests
 Confirms disease in individuals with signs or symptoms of the disease (e.g.,