How many questions are included?

Over 1,000 SBA questions covering the full FRCEM Final syllabus, with detailed explanations for every answer.

Is this just a question bank?

No — we also provide coaching on exam technique, revision strategy, and how to approach the exam mentally, especially if you've failed before.

Can I try before I subscribe?

Yes. We offer free sample questions so you can test the style and quality before committing.

I've failed multiple times. Will this help?

That's exactly who we built this for. We focus on identifying weak spots, rebuilding confidence, and giving you a strategy that actually works.

What's the refund policy?

Once you subscribe and access the content, we don't offer refunds. That's why we provide free sample questions first — so you can make sure it's right for you before committing.

What percentage of FRCEM Final is research and statistics?

Approximately 6% — about 10 questions across SLO 10 and 11. These cover diagnostic methodologies, study types, statistical techniques, and quality improvement.

What is the Jadad score?

The Jadad score (Oxford quality scoring system) assesses the quality of RCTs on a scale of 0-5. It scores randomisation, blinding, and reporting of withdrawals. A score of 3 or above indicates a high-quality trial.

How do you calculate NNT?

NNT = 1 / ARR (Absolute Risk Reduction). ARR = control event rate minus treatment event rate. Always use absolute risk reduction, never relative risk reduction.

What is the difference between audit and research?

Audit measures current practice against an existing standard and uses the PDSA cycle. Research generates new knowledge and tests hypotheses. Audit does not need ethics approval; research does.

Research & Statistics for FRCEM Final: The Complete Guide

Research & Statistics for FRCEM Final: The Complete Guide

SLO 10 and 11 account for about 10 questions on FRCEM Final — and they're some of the most commonly failed. Most candidates skip stats revision because it feels dry and intimidating. But these are genuinely free marks if you learn the basics. This guide can't cover everything you'll need, but it covers the fundamentals and the most common exam themes.

Study Types & Evidence Hierarchy

Evidence hierarchy (strongest to weakest):

Level 1a — Systematic review of RCTs
Level 1b — Individual RCT (with narrow CI)
Level 2a — Systematic review of cohort studies
Level 2b — Individual cohort study / low-quality RCT
Level 3a — Systematic review of case-control studies
Level 3b — Individual case-control study
Level 4 — Case series / case report
Level 5 — Expert opinion

RCT: Gold standard for testing treatments. Randomises participants to intervention vs control. Minimises bias through randomisation and blinding.

Cohort study: Follows groups over time. Can be prospective or retrospective. Good for prognosis and identifying risk factors. Cannot prove causation but can show strong associations.

Case-control study: Starts with the outcome and looks back for exposure. Good for rare diseases because you don't need to follow thousands of people. Quicker and cheaper than cohort studies.

Cross-sectional study: A snapshot at one point in time. Good for prevalence. Cannot show causation or temporal relationships.

Exam tip: The exam will describe a clinical scenario and ask which study design is most appropriate. Don't just pick RCT every time — match the design to the question being asked.

Diagnostic Formulae

The 2x2 table:

	Disease +	Disease −
Test +	True Positive (TP)	False Positive (FP)
Test −	False Negative (FN)	True Negative (TN)

Sensitivity = TP / (TP + FN) — how good the test is at detecting disease. A highly sensitive test rules OUT disease when negative (SnNOut).

Specificity = TN / (TN + FP) — how good the test is at confirming health. A highly specific test rules IN disease when positive (SpPIn).

PPV = TP / (TP + FP) — affected by prevalence. Higher prevalence leads to higher PPV.

NPV = TN / (TN + FN) — affected by prevalence. Lower prevalence leads to higher NPV.

ARR (Absolute Risk Reduction) = control event rate − treatment event rate.

NNT (Number Needed to Treat) = 1 / ARR. Always uses absolute risk reduction, never relative.

ARI (Absolute Risk Increase) = treatment adverse event rate − control adverse event rate.

NNH (Number Needed to Harm) = 1 / ARI. If NNH < NNT, the drug harms more than it helps.

Exam tip: Draw the 2x2 table. Every sensitivity/specificity question can be solved from it. Classic exam trap: pharma reps use relative risk reduction (sounds impressive) when absolute risk reduction is tiny. Always calculate NNT from ARR yourself.

ROC Curves & AUC

ROC (Receiver Operating Characteristic) curve: Plots sensitivity (y-axis) vs 1−specificity (x-axis) at different diagnostic thresholds. The gold curve bowing toward the top-left corner indicates a good test. The diagonal dashed line represents chance (useless test).

AUC (Area Under the Curve) measures overall test accuracy:

0.5 = useless (follows the diagonal — no better than chance)
0.7–0.8 = acceptable
0.8–0.9 = excellent
>0.9 = outstanding
1.0 = perfect test

The top-left corner represents a perfect test. The diagonal line represents a test no better than chance. If the exam shows two ROC curves, the one with the larger AUC is the better test.

Exam tip: ROC curves are commonly tested. If the curve hugs the top-left corner, it's a good test. If it follows the diagonal, it's useless.

Power & Sample Size

Power = 1 − β = the probability of detecting a true difference if one exists. Conventionally set at 80% (0.8).

α (alpha) = significance level (usually 0.05) = probability of Type I error (false positive — finding a difference that doesn't exist).

β (beta) = probability of Type II error (false negative — missing a real effect).

Power calculation components:

Effect size (must be clinically meaningful)
Alpha level
Desired power
Variance in the data

How each component affects sample size:

Bigger effect size = smaller sample needed
Smaller alpha = larger sample needed
Higher power = larger sample needed
More variance = larger sample needed

Power calculations are performed before the study to determine sample size.

Lowering a diagnostic threshold (e.g. troponin cut-off) increases sensitivity but increases false positives (Type I error).

Exam tip: Type I = false positive (α). Type II = false negative (β). Power = 1 − β. If a study finds 'no difference' with a small sample — was it adequately powered? Consider how each component affects the validity of the findings.

Trial Design Concepts

Concealed allocation: Sealed envelope given BEFORE the trial starts. The patient doesn't know their group assignment. Prevents selection bias at enrolment.

Randomisation: Assigning participants to groups AFTER enrolment. Ensures groups are comparable at baseline.

Blinding:

Single-blind: patient doesn't know their group
Double-blind: patient + researcher don't know
Triple-blind: patient + researcher + assessor don't know

Blinding reduces observer bias and placebo effects.

Surrogate endpoint: A proxy marker used instead of the actual clinical outcome. E.g. DVT on Doppler as a surrogate for PE in pregnancy (where CTPA radiation is a concern).

Kappa coefficient: Measures inter-rater reliability. 0 = no agreement beyond chance. 0.5 = moderate agreement. 1.0 = perfect agreement.

Demographic table: Compares baseline characteristics between groups using p-values. You WANT p > 0.05 — this means the groups are comparable.

Exam tip: If the demographic table shows p < 0.05, the groups are NOT comparable — this is a confounding issue, not a good result.

Types of Bias

Selection bias: Systematic differences in who is recruited. Groups are not representative of the target population.
Recall bias: Participants with disease remember exposures differently from those without. Common in case-control studies.
Observer/detection bias: Assessors measure outcomes differently based on knowledge of group allocation. Prevented by blinding.
Attrition bias: Systematic differences in dropouts between groups. If sicker patients drop out of the treatment arm, results look falsely good.
Publication bias: Positive studies are more likely to be published than negative ones. Detected with funnel plots.
Lead-time bias: Earlier detection appears to increase survival time without actually changing outcomes. Common in screening studies.
Hawthorne effect: Participants change behaviour because they know they're being observed, not because of the intervention itself.

Confounding: A third variable affects both the exposure and the outcome, creating a spurious association. Controlled by randomisation, matching, stratification, or multivariate analysis.

Exam tip: The exam may describe a study and ask you to identify the type of bias. Read the scenario carefully — the bias is usually in the methodology, not the results.

Assessing Study Quality — The Jadad Score

The Jadad score (Oxford quality scoring system) assesses RCT quality on a scale of 0–5:

Was the study described as randomised? (+1)
Was the randomisation method appropriate? (+1) — e.g. computer-generated, NOT alternate allocation
Was the study described as double-blind? (+1)
Was the blinding method appropriate? (+1) — e.g. identical placebo
Were withdrawals and dropouts described? (+1) — numbers and reasons in each group

Deductions: −1 if randomisation method inappropriate. −1 if blinding method inappropriate.

Score ≤2 = low quality. Score ≥3 = high quality.

Other quality assessment tools:

CONSORT: Reporting standard for RCTs
PRISMA: Reporting standard for systematic reviews
GRADE: Overall evidence quality rating system
Newcastle-Ottawa: Quality assessment for cohort and case-control studies

Exam tip: The exam tests whether you can identify study limitations, not just name the quality tools. But know the Jadad score components — it's the most commonly tested.

Interpreting Results

P-value <0.05 = statistically significant by convention. This does NOT mean clinically important — a huge trial can find a tiny, meaningless difference that reaches statistical significance.

Confidence interval: The range where the true value likely lies. If the 95% CI crosses 1.0 (for ratios like RR/OR) or 0 (for differences) — the result is not significant, regardless of the p-value.

Narrow CI = precise estimate
Wide CI = imprecise estimate (often from small sample)

Heterogeneity (I²) in meta-analysis: >50% is significant — you should question whether the studies should be combined at all.

Intention-to-treat (ITT) vs per-protocol: ITT analyses everyone enrolled (even dropouts). It's more conservative and avoids attrition bias. Per-protocol only analyses those who completed the study — risks overestimating treatment effect.

Exam tip: A result can have p < 0.05 but a confidence interval that crosses the 'no effect' line — the CI is more reliable. Always check both.

Graph Types

Forest plot (blobogram): Displays meta-analysis results. Each horizontal line represents one study — the square is the point estimate (bigger square = larger study = more weight), the line is the confidence interval. The diamond at the bottom is the overall pooled effect. A vertical line at 1.0 represents no effect. If the diamond crosses this line, the overall result is not significant.

Funnel plot: Used BEFORE meta-analysis to detect publication bias. Studies are plotted by effect size (x-axis) vs precision/sample size (y-axis). In an unbiased sample, dots should be symmetrically distributed around the central line like an inverted funnel. Asymmetry suggests publication bias — smaller studies with negative results may not have been published.

Kaplan-Meier curve: Survival analysis over time. The y-axis is survival probability, x-axis is time. Steps downward represent individual events (e.g. deaths). Two curves are plotted (treatment vs control) — separation between the curves indicates a treatment effect. The greater the separation, the larger the effect.

Scatter plot: Shows the relationship between two continuous variables. Dots trending upward = positive correlation, downward = negative, no pattern = no correlation. A line of best fit may be drawn. The r-value (correlation coefficient) measures strength: closer to 1 or -1 = stronger. Important: correlation does NOT equal causation.

Box & whisker plot: Summarises the distribution of data. The line inside the box = median. The box edges = interquartile range (IQR, 25th to 75th percentile). The whiskers extend to the range. Dots beyond the whiskers = outliers. Useful for comparing distributions between groups — if the boxes don't overlap, there's likely a significant difference.

ROC curve: See Section 3 (Diagnostic Formulae) above for full detail including AUC values and illustration.

Exam tip: The exam won't just ask you to name a graph. It will show you one and ask what conclusion you can draw, whether the result is significant, or what a specific component means. Practise interpreting each type, not just recognising them.

Quality Improvement

Audit: Measures current practice against an existing STANDARD. Uses the PDSA (Plan-Do-Study-Act) cycle. Must re-audit to close the loop. Does NOT need ethics approval.

Research: Generates NEW knowledge. Tests a hypothesis. NEEDS ethics approval.

QI methodologies:

EBCD (Experience-Based Co-Design): Uses patient and staff experiences to drive improvement
Driver diagrams: Primary aim → primary drivers → secondary drivers → change ideas
5 Whys / 3 Whys: Root cause analysis — keep asking "why?" until you reach the underlying cause
Ishikawa (fishbone) diagram: Categorises causes into groups (people, process, equipment, environment, etc.)
Pareto analysis (80/20 rule): 80% of problems come from 20% of causes. Fix the top causes first for maximum impact.
Process mapping: Visualising the patient journey to identify bottlenecks and waste
Run charts / SPC charts: Tracking improvement over time. Distinguishes special cause variation (something changed) from common cause variation (normal fluctuation).
Gantt chart: Project timeline showing tasks, duration, and dependencies

You do NOT need 100% of data for PDSA — just enough, collected sequentially, to test your change idea.

Exam tip: Audit vs research: is there a pre-existing standard being measured against? If yes = audit. If generating new knowledge = research.

These are free marks

Most candidates panic when they see a stats question and guess. If you learn the formulae, recognise the graph types, and understand the basic concepts, you'll pick up 5–10 marks that others throw away. That's the difference between passing and failing.

Recommended Reading

"Critical Appraisal for FCEM" — Bootland, Coughlan, Galloway & Goubet (CRC Press). Covers everything you need for SLO 10/11 in plain language.

Ready to practise?

More advice and 1,000+ questions mapped to the FRCEM blueprint. Not just a question bank — support, guidance and help where you need it.

Try Free Questions → Sign Me Up →