Multiple Imputation in Clinical Research: Univariate and Multivariate Approaches [Multiple imputation, MI]

Mayta
May 31
3 min read

Introduction

Missing data is a recurrent challenge in clinical and epidemiological studies. Multiple Imputation (MI) provides a statistically rigorous approach to restoring analytical completeness while accounting for uncertainty. As datasets grow more complex, with multiple variables missing and interdependent, understanding how to perform univariate and multivariate imputations becomes essential.

This article provides a step-by-step overview of how Multiple Imputation (MI) is performed under both simple and complex patterns of missingness, with a focus on practical implementation and decision-making logic.

Multiple Imputation: A Recap of the Process

MI is not a one-time fill-in-the-blank operation. It consists of three critical phases:

Imputation: Generate several plausible datasets where missing values are filled using model-based predictions and injected variability.
Analysis: Perform the desired statistical model on each imputed dataset independently.
Pooling: Combine results across datasets to produce a final estimate that reflects both within- and between-imputation variability.

This ensures that imputed values are not treated as known truths but as approximations that respect statistical uncertainty.

Univariate Imputation: One Variable, One Problem

When to Use

This approach is suitable when only one variable has missing data, and all others are fully observed.

How It Works

The missing variable (e.g., a biomarker) is modeled as a function of fully observed predictors.
The model predicts values for the missing entries, with added noise to account for uncertainty.

Example

Imagine a dataset where a continuous lab value is missing for a subset of patients. You might predict it using age, sex, and clinical scores—all of which are complete.

Imputation Model:Missing_lab = f(age, sex, score)

Implementation in Stata

The mi impute command family allows various model types:

mi impute regress: Linear regression for continuous variables
mi impute logit: Logistic regression for binary variables
Other forms include truncated regression, ordinal logistic, Poisson, and predictive mean matching.

Multivariate Imputation: More Variables, More Complexity

When to Use

Multivariate imputation is necessary when multiple variables are missing, often in overlapping and interdependent ways.

Key Distinction: Monotone vs Non-Monotone Patterns

Monotone Missingness

Definition: The pattern of missingness follows a consistent sequence—e.g., if variable X4 is missing, X3 is also missing, and so on.
Example: In longitudinal studies, later follow-up data are often missing for patients lost to follow-up.
Method: Perform chained univariate imputations. Begin with the variable that is most complete and use it to impute the others in a cascading fashion.

Non-Monotone Missingness

Definition: The missingness pattern is arbitrary. A variable may be missing even when others are present or absent without a predictable sequence.
Example: A patient may have missing baseline lab data and missing outcome data, while other covariates are intact.
Challenge: Cannot rely on a simple univariate chain. Requires iterative modeling that accommodates feedback loops.

Advanced Solutions for Multivariate Missingness

To tackle the complexity of multivariate patterns—especially non-monotone—modern MI techniques rely on iterative algorithms.

1. Multiple Imputation by Chained Equations (MICE)

Also known as Fully Conditional Specification (FCS).
Models each variable with missing data conditionally, using all other variables as predictors.
Iteratively cycles through each variable, updating imputations with every pass.

2. Monotone Method

Only usable when missingness follows a clear sequence.
Faster, but less flexible.

3. Multivariate Normal Imputation (MVN)

Assumes that all variables follow a multivariate normal distribution.
Best used when this assumption approximately holds (e.g., many continuous variables).

Stata Commands

mi impute chained: For non-monotone data, most commonly used.
mi impute monotone: For sequential missing patterns.
mi impute mvn: For continuous data under multivariate normality.

Tip: If the pattern appears monotone but is not formally tested, mi impute chained still works and causes no harm.

Conclusion

Univariate and multivariate imputations represent two sides of the same methodological coin—simple vs complex—but both serve the goal of reducing bias and improving precision in the face of incomplete data. Mastering the distinctions between them, and applying the right tool based on your data’s missingness structure, is essential for reliable clinical inference.

Key Takeaways

Univariate MI is suitable when only one variable is incomplete. Regression-based or model-specific imputation is used.
Multivariate MI handles datasets with multiple incomplete variables. It distinguishes monotone from non-monotone patterns.
MICE (or FCS) is the most flexible and widely used method for complex missingness structures.
In Stata, mi impute chained is the safest general-purpose tool.
Use the pattern of missingness to choose between univariate chaining or iterative full models

Multiple Imputation in Clinical Research: Univariate and Multivariate Approaches [Multiple imputation, MI]

Introduction

Multiple Imputation: A Recap of the Process

Univariate Imputation: One Variable, One Problem

When to Use

How It Works

Example

Implementation in Stata

Multivariate Imputation: More Variables, More Complexity

When to Use

Key Distinction: Monotone vs Non-Monotone Patterns

Monotone Missingness

Non-Monotone Missingness

Advanced Solutions for Multivariate Missingness

1. Multiple Imputation by Chained Equations (MICE)

2. Monotone Method

3. Multivariate Normal Imputation (MVN)

Stata Commands

Conclusion

Key Takeaways

Recent Posts

Comments