← All posts

Cohen’s Kappa Explained: Weighted Agreement in Clinical Research

Clinical Epidemiology ResearchUniqcret doctor knowledgesDiagnosis [Methodology]Methodology and Research Design

Introduction Why Use Cohen’s Kappa in Diagnostic Research?

Cohen’s Kappa (κ) is a widely used statistical measure for assessing agreement between two raters or measurement methods that classify items into categorical outcomes. It adjusts for the agreement that could occur by chance, providing a more realistic and conservative estimate of concordance.

Purpose and Applications

  1. Measuring Inter-rater or Inter-method Agreement
    • Kappa is primarily used to quantify the level of agreement between observers or diagnostic methods.
    • Example: Two hepatologists classify the same patient cohort for liver fibrosis stage based on MRI and biopsy results.
  2. Adjusting for Chance Agreement
    • Unlike raw percentage agreement, Kappa accounts for agreement that could occur randomly, ensuring the result reflects true concordance between raters or methods.
  3. Analyzing Categorical or Ordinal Data
    • Kappa applies to categorical variables, both nominal (unordered) and ordinal (ordered).
    • Weighted Kappa, in particular, is appropriate when the categories have an inherent order, such as staging or severity grading.

🔬 Importance in Diagnostic Accuracy Research

In diagnostic accuracy research, Cohen’s Kappa serves as a reliability metric to evaluate:

📌 Diagnostic accuracy alone is not sufficient unless the test or observer demonstrates consistent and reproducible results across repeated assessments or raters.


1. What is Cohen’s Kappa (κ)?

Cohen’s kappa is a statistical measure of agreement between two raters (or methods) who classify items into categorical outcomes.It adjusts for agreement that would occur by chance — providing a more realistic measure of concordance.

The kappa coefficient ranges from –1 to +1:

κ valueInterpretation
< 0Less than chance agreement
0.00–0.20Slight
0.21–0.40Fair
0.41–0.60Moderate
0.61–0.80Substantial
0.81–1.00Almost perfect


2. Why “weighted” kappa?

When categories have a natural order (e.g., F0 < F1 < F2 < F3 < F4), not all disagreements are equal.

Weighted kappa assigns partial credit for “near” agreement, reflecting the degree of difference.

In essence, weighting makes κ sensitive not only to whether disagreement exists, but also how big that disagreement is.


3. Weight types in Stata (kap command)

In Stata, the kap command allows you to specify different weight schemes using the wgt() option:

kap variable1 variable2, wgt(w)   // linear weights
kap variable1 variable2, wgt(w2)  // quadratic weights

3.1. Linear Weights (wgt(w))

🔹 Example: If MRE = F2 and biopsy = F3, penalty = 1 step (small). If MRE = F0 and biopsy = F4, penalty = 4 steps (large).

3.2. Quadratic Weights (wgt(w2))

🔹 In medical research (especially fibrosis staging), quadratic weighting is the standard choice because:


4. When to choose each weighting

Weight typeUse when…Typical example
Unweighted (default)Data are nominal (no inherent order)Male/Female; Positive/Negative
Linear (wgt(w))Categories are ordered and evenly spacedPain scores (1–10), Likert scales
Quadratic (wgt(w2))Categories are ordered but not equally spaced; large errors matter moreLiver fibrosis (F0–F4), cancer grades, disease severity scales

5. How to compare weighting results

Run both commands:

kap LiverBx_FCHFS_5stage MRE_stage, wgt(w)
kap LiverBx_FCHFS_5stage MRE_stage, wgt(w2)

Interpret the difference:

Δκ = κ(w2) – κ(w)MeaningSuggested action
< 0.05Differences mostly adjacent; linear is acceptableEither
0.05–0.10Moderate nonlinearity; prefer quadraticPrefer wgt(w2)
> 0.10Many large disagreements; quadratic weighting clearly appropriateUse wgt(w2)

6. Reporting weighted kappa in a study

When publishing:

Weighted Cohen’s kappa was used to evaluate agreement between MRE-derived and biopsy-derived fibrosis stage. Quadratic weighting was applied to penalize larger staging discrepancies more heavily, given the ordinal and clinically non-linear nature of fibrosis stages.

Example result:

Agreement between MRE and biopsy staging was substantial (quadratic weighted κ = 0.74, 95% CI 0.61–0.87).


7. Applicability — Is weighting for all κ?

No — weighted kappa is only for ordinal data.

Data typeKappa typeExample
NominalSimple (unweighted)Gender, infection present/absent
OrdinalWeightedLiver fibrosis stage, NYHA class
ContinuousNot kappa — use ICC (Intraclass Correlation Coefficient)Lab values, test results

8. Practical Example in Hepatology

// Step 1: Compare biopsy vs MRE stage
kap LiverBx_FCHFS_5stage MRE_stage, wgt(w2)

// Step 2: Compare biopsy vs Ultrasound stage
kap LiverBx_FCHFS_5stage US_stage, wgt(w2)

Interpretation:

“Quadratic weighted κ was used due to ordinal staging.κ = 0.68 indicated substantial agreement between imaging and histology.”


9. Summary Table

Type of kappaData typeWeight formulaWhen to useStata code
Simple κNominalNoneCategorical (unordered)kap x y
Linear weighted κOrdinal (equal spacing)1 –i–j/ (k–1)
Quadratic weighted κOrdinal (uneven spacing)1 – ((i–j)/(k–1))²Clinical staging, fibrosiskap x y, wgt(w2)

10. Key takeaway

Weighted kappa refines the measurement of agreement by considering how far apart disagreements are. Use linear weights when all categories are evenly spaced. Use quadratic weights for clinically ordered but non-linear categories — the most common in medical research. In fibrosis staging, quadratic weighting (wgt(w2)) is almost always the correct choice.


Example in Hepatology Context

In studies assessing liver fibrosis staging (F0–F4), the rater or method can be represented as variables:

In Stata, this comparison is typically analyzed using weighted Kappa:

kap LiverBx_FCHFS_5stage MRE_stage, wgt(w2)

Here, quadratic weighting (wgt(w2)) is applied to account for the ordinal nature of fibrosis staging, where large discrepancies (e.g., F0 vs F4) are penalized more heavily than minor ones (e.g., F2 vs F3).

🧭 Summary

Comments

No comments yet. Be the first to share your thoughts.

Sign in to comment

Cohen’s Kappa Explained: Weighted Agreement in Clinical Research — Uniqcret