← All posts

How to Choose Statistical Coefficients for Each Type of Reliability

Clinical Epidemiology ResearchUniqcret doctor knowledgesMethodology and Research DesignDiagnosis [Methodology]

Reliability Statistic Selector

Reliability Statistic Selector

Choose your situation (you can select more than one in each group), then click Suggest statistics.

1. Reliability Context

Example: Questionnaire over time → Test–Retest; two doctors rating same patient → Inter-Rater.

2. Data Type (measurement scale)

You can tick more than one if needed (e.g. scale combines continuous and dichotomous items).

3. Raters / Forms

For internal consistency, raters/forms are usually “not applicable”.

Summary Table: Types of Reliability & Statistical Coefficients

Reliability TypePurposeData TypeStatistical Coefficients (Named Statistics)
1. Test–Retest ReliabilityMeasures stability over time (same test, two occasions)ContinuousPearson r   • Spearman ρ (ordinal or non-normal) • ICC (Intraclass Correlation Coefficient) • CCC (Concordance Correlation Coefficient)
  OrdinalSpearman ρ   • Weighted Cohen’s kappa
  Nominal / DichotomousCohen’s kappa (κ)
  Coefficient of Stability (concept label)
2. Inter-Rater ReliabilityMeasures agreement between different ratersContinuousICC   • CCC   • (Bland–Altman plot for visual agreement)
  OrdinalWeighted Cohen’s kappa   • Krippendorff’s alpha (multi-rater)
  Nominal / DichotomousCohen’s kappa (2 raters) • Fleiss’ kappa (≥3 raters) • Scott’s Pi   • Brennan–Prediger kappa   • Krippendorff’s alpha
3. Intra-Rater ReliabilityMeasures consistency of one rater over timeContinuousICC   • CCC
  OrdinalWeighted Cohen’s kappa
  Nominal / DichotomousCohen’s kappa
4. Parallel-Forms ReliabilityMeasures equivalence between Form A and Form BContinuousPearson r  • Spearman ρ  • ICC
  OrdinalSpearman ρ
  Nominal / DichotomousCohen’s kappa (rare use)
  Coefficient of Equivalence (concept label)
5. Internal Consistency ReliabilityMeasures how well items in a scale measure the same constructMulti-item scaleCronbach’s alpha (α) (most used) • McDonald’s omega (ω) (better when factor loadings differ) • Guttman’s lambda (λ2–λ6)   • KR-20 (dichotomous items only) • KR-21  • Split-half reliability + Spearman–Brown formula   • Coefficient H (latent variable reliability)

Introduction

In measurement and psychometrics, reliability describes how consistently an instrument measures whatever it is supposed to measure. Different types of reliability focus on different sources of variation: time, raters, forms, and items.

Below are the main reliability types, with their key named statistics.


1. Test–Retest Reliability

What it is

Test–retest reliability asks:

If we measure the same person with the same instrument at two different times (and the trait has not truly changed), do we get similar scores?

It reflects the stability over time of a measurement.

When it is used

Main statistics (named coefficients)

For continuous scores:

For categorical scores:

General term:


2. Inter-Rater Reliability

What it is

Inter-rater reliability asks:

If two or more raters assess the same subjects, how consistently do they agree?

It focuses on agreement between different observers.

When it is used

Main statistics (named coefficients)

For categorical ratings

For continuous ratings


3. Intra-Rater Reliability

What it is

Intra-rater reliability asks:

If the same rater assesses the same subjects on different occasions, are their own ratings consistent?

This is about the self-consistency of one observer.

When it is used

Main statistics (named coefficients)

For continuous data:

For categorical data:

For agreement emphasis:

Conceptually, intra-rater uses the same coefficients as inter-rater reliability, but all ratings come from one person at multiple times instead of multiple people.


4. Parallel-Forms (Alternate-Forms) Reliability

What it is

Parallel-forms reliability asks:

If we create two different but equivalent versions of a test, do they give similar results for the same people?

This focuses on consistency between two forms of the same instrument.

When it is used

Main statistics (named coefficients)

For continuous test scores:

Generic term:

Sometimes, when you combine alternate-forms + test–retest, the result is called coefficient of stability and equivalence, but that’s more conceptual than a distinct formula.


5. Internal Consistency Reliability

What it is

Internal consistency asks:

Do the items within a single test or questionnaire all work together to measure the same underlying construct?

This is particularly important for multi-item scales (e.g., depression scales, quality of life instruments, satisfaction surveys).

When it is used

Main statistics (named coefficients)

Other related indices:


Summary

Each type of reliability focuses on a different question:

Comments

No comments yet. Be the first to share your thoughts.

Sign in to comment