Clinical Trial Lifecycle Explained: From Protocol Development to SAP and CSR
- Mayta

- 14 hours ago
- 6 min read
1) Developing Protocol + Sample Size
A. Scientific + clinical foundation
Define study rationale (unmet need, mechanism, prior evidence, feasibility).
Translate to objectives:
Primary objective (one “win condition”)
Key secondary objectives (ranked)
Exploratory objectives (biomarkers, PROs, substudies)
Define endpoint strategy
Primary endpoint (precise definition, timepoint, ascertainment)
Secondary endpoints (hierarchy / multiplicity plan)
Safety endpoints (AEs, SAEs, AESIs, lab/vitals/ECG)
Endpoint adjudication plan (if needed)
B. Trial design decisions (core protocol architecture)
Choose design: parallel RCT / single-arm / crossover / cluster / pragmatic vs explanatory
Define:
Population (inclusion/exclusion, washout, prohibited meds)
Study setting (sites, countries, recruitment pathway)
Treatment arms (dose, schedule, rescue meds, allowed concomitants)
Randomization (ratio, stratification factors, blocking)
Blinding (double-blind, double-dummy, open-label + blinded assessor)
Visit schedule + windows (procedures per visit)
Adherence and compliance measurement plan
Stopping rules / discontinuation criteria
Safety oversight: DSMB/IDMC charter needs, AE reporting timelines
C. Statistical framework in the protocol (high-level)
Define estimand logic (what effect you want: treatment policy vs hypothetical, etc.)
Specify analysis populations (ITT, PP, Safety) at a high level
Outline primary model (e.g., logistic / Cox / MMRM / ANCOVA), alpha, CI
D. Sample size workflow (from endpoint → assumptions → n)
Select the primary endpoint metric and effect measure:
Binary (risk difference/ratio, OR), time-to-event (HR), continuous (mean diff), repeated measures (MMRM)
Determine assumptions:
Control event rate / mean & SD
Expected treatment effect (clinically meaningful)
Alpha (two-sided vs one-sided), power
Allocation ratio (1:1, 2:1)
Dropout/non-evaluable inflation
Special scenarios to account for:
Stratification/cluster effects (ICC), multi-center variation
Interim analysis (alpha spending)
Non-inferiority margin logic (if NI)
Multiple primary endpoints / multiplicity
Write sample size justification narrative (clinically interpretable, defensible)
E. Operational + compliance deliverables
Protocol synopsis (one-page)
Full protocol + schedule of assessments (SoA)
Informed consent / assent materials
Investigator brochure or reference safety info
Randomization plan high-level (details often in SAP / randomization spec)
Outputs
Final protocol (and amendment plan)
Sample size report (assumptions + formula/software + inflation)
Draft “TLF shells” outline (optional but useful early)
2) Project Set-Up & Data Management
A. Project set-up (operations & governance)
Build project plan: timelines, milestones, critical path (FPI/LPI/LPLV/DBL/topline/CSR)
Define roles and RACI: sponsor/CRO/PI/statistician/DM/medical monitor
Vendor qualification + contracts:
EDC, IWRS/RTSM (randomization), central lab, imaging, ePRO, PK lab, safety database
Site feasibility + selection:
Site capability, recruitment, competing trials, staff training needs
Trial master file (TMF) structure + QC plan
Training:
Protocol training, GCP training, EDC training, safety reporting training
B. Data management planning (DMP)
Create Data Management Plan (DMP):
Data sources (EDC, labs, devices, ePRO, imaging)
Data flow diagrams + transfer specs
Data standards (CDISC expectations if applicable)
Quality strategy (edit checks, query process, cleaning cycles)
Coding plan (MedDRA for AEs, WHO-DD for meds)
Reconciliation plans (SAE vs EDC, lab vs EDC)
Database lock criteria & checklist
C. CRF/eCRF + database build
Build CRF from protocol endpoints and SoA (no “nice-to-have” fields)
Annotated CRF (mapping to dataset variables)
EDC build + validations:
Edit checks, range checks, visit windows, missing prompts
UAT (user acceptance testing) with documented cases + sign-off
Role-based access + audit trail configuration
D. Data cleaning + ongoing quality
Medical coding cycles (regular cadence)
Query management + KPI tracking
Central monitoring (RBM) checks:
Outliers, fraud signals, protocol deviations, missingness patterns
Data reviews:
Monthly listings review, blinded data review meetings (BDRM)
E. Database lock readiness
Lock preparation:
All queries resolved/closed
SAE reconciliation complete
External data reconciliation complete
Protocol deviations finalized
Database lock (DBL) execution + documentation
Outputs
DMP, CRF/eCRF, edit-check specs
Validated EDC database + audit trail
Clean, locked analysis-ready data extracts
3) Data Analysis + SAP (extra detailed SAP)
A. Where SAP sits
Protocol = what/why (high-level stats)
SAP = exactly how you will analyze, pre-specified before database lock
Programming specs (sometimes separate) = variable derivations + dataset build rules
B. SAP: recommended structure (what to include)
1) Administrative & governance
SAP version history, author/reviewer approvals, effective date
Links to protocol version, amendments, and rationale for SAP updates
Blinding status for statisticians (who is blinded, who is unblinded)
2) Study overview (tight summary)
Design, arms, randomization, stratification factors, visit schedule
Primary objective/endpoint (exact wording consistent with protocol)
3) Estimands (increasingly required)
For each primary (and key secondary) endpoint:
Population (who)
Treatment condition (what comparison)
Variable (endpoint definition)
Intercurrent events handling (e.g., rescue medication, treatment discontinuation)
Summary measure (difference in means, HR, OR, etc.)
4) Analysis populations (define precisely)
ITT / Full Analysis Set: all randomized, analyzed as assigned
Per-Protocol (PP): exact criteria (adherence thresholds, major deviations)
Safety: all who received ≥1 dose (as treated)
Optional:
mITT (only if defensible + pre-specified; define clearly)
Pharmacokinetic set / biomarker set
5) Trial conduct rules used in analysis
Definition and classification of protocol deviations
Major vs minor, who adjudicates, timing of finalization
Handling of mis-randomization and mistaken inclusions
6) General statistical principles
Significance level (alpha), two-sided vs one-sided
Confidence intervals approach
Multiplicity strategy:
Hierarchical testing / gatekeeping / Bonferroni / Hochberg, etc.
Covariate adjustment principles (pre-specified covariates, stratification factors)
Center effects handling (fixed vs random effects; pooling rules)
7) Data handling conventions
Baseline definition rules (visit windows; last observation prior to first dose)
Derived variables rules (change from baseline, time-to-event definitions)
Outliers (detection + whether excluded—usually not; handled via sensitivity)
Transformations (log transform rules)
Concomitant medications coding rules
Rescue medication rules (and how they impact estimands)
8) Missing data strategy (must be explicit)
Missingness assumptions: MCAR/MAR/MNAR (what you assume and why)
Primary approach by endpoint type:
Continuous repeated measures: MMRM often assumes MAR
Binary: multiple imputation / tipping point / non-responder imputation (if relevant)
Time-to-event: censoring rules (precise)
Sensitivity analyses:
Worst-case / best-case
Pattern mixture models / delta-adjustment
Tipping point analyses
9) Primary endpoint analysis (very specific)
For the primary endpoint, specify:
Statistical model (exact)
Estimand alignment
Covariates included (and justification)
Hypothesis statement
Effect estimate + CI reporting
Diagnostics (model checks) and fallback methods if assumptions fail
Examples of the “detail level”:
If continuous: ANCOVA vs MMRM, baseline adjustment, visit-by-treatment interaction
If time-to-event: Cox model + proportional hazards checks; censoring; KM summaries
If binary: logistic regression; risk difference estimation method; exact tests if sparse
10) Secondary endpoints analysis
List each endpoint with:
Model
Multiplicity placement (hierarchy rank or adjusted p-value approach)
Timepoints and summaries
11) Subgroup analyses (pre-specify, don’t fish)
Subgroups: sex, age bands, severity strata, biomarker status, region, etc.
Method: interaction tests (treatment × subgroup)
Forest plot conventions and interpretation cautions
12) Sensitivity analyses (must be pre-specified)
PP analysis, as-treated analysis (usually supportive)
Alternative missing data assumptions
Alternative model forms (robust regression, nonparametric)
If noncompliance is big: consider CACE as supportive (define how estimated)
13) Interim analyses (if applicable)
Timing rules (information fraction, event count)
Alpha spending function / boundaries
Who is unblinded, and what reports are generated
Operational firewall procedures
14) Safety analysis (often the biggest section)
Exposure summaries (duration, dose intensity)
TEAEs, SAEs, AESIs:
Coding dictionary version (MedDRA)
Treatment-emergent definition window
Summaries by SOC/PT, severity, relatedness
Risk differences, incidence rates (if time-at-risk differs)
Labs/vitals/ECG:
Shift tables (baseline → worst on-treatment grade)
Clinically significant thresholds
Deaths and discontinuations:
Narratives plan (who writes, template, QC)
15) Patient-reported outcomes / QoL (if present)
Scoring rules, missing item handling, responder definitions
Timepoints and multiplicity
16) Data standards + outputs
Dataset standards (SDTM/ADaM if used)
TLF shells (Tables, Listings, Figures) included or referenced
Mock outputs + footnotes conventions
17) Quality control & reproducibility
Double programming / independent validation plan
Audit trail of code, datasets, outputs
Final SAP sign-off procedure
SAP Deliverables
Final signed SAP (pre-DBL)
TLF shells (mock tables/figures)
Programming specs / analysis dataset specs (often separate but linked)
Topline outputs + final TLF package
C. Execution of analysis (after DB lock)
Data freeze → DBL → pull analysis datasets
Run primary analysis exactly per SAP
QC and discrepancy resolution
Generate:
Topline summary (fast sponsor decision-making)
Full TLFs + narratives inputs
Document any deviations from SAP (rare; justified and logged)
4) Summary Report to Sponsor
A. Types of sponsor-facing reporting (typical sequence)
Topline / Executive summary
Primary endpoint result, key safety signals, major deviations
Go/no-go decision support
Clinical Study Report (CSR)
Full regulatory-style report (often aligned with ICH E3 structure)
Integrated efficacy + safety + trial conduct
Supporting packages
TLF appendix
Patient narratives (deaths, SAEs, discontinuations)
Data definition lists, audit trail evidence
Protocol deviations listing and impact discussion
B. CSR workplan tasks
CSR shell creation early (while trial ongoing)
Populate sections after DBL:
Disposition, baseline, efficacy, safety, deviations
Medical writing + statistical review cycles
QC steps:
Table/figure cross-checks
Consistency checks (numbers match across text, tables, listings)
Traceability (protocol ↔ SAP ↔ CSR)
Sponsor sign-off and finalization
Optional: manuscript drafting, conference abstract, registry reporting
Outputs
Topline report
Final CSR + appendices
Sponsor slide deck (board-level summary)
✅ Key takeaways
Protocol + sample size sets the scientific contract of the trial.
Project setup + data management ensures data are clean, traceable, auditable.
SAP is the “no-flex” analysis rulebook (pre-DBL), far more detailed than protocol.
Sponsor report/CSR translates results into decision- and regulator-ready format.






Comments