Stata mfp in Practice: Fractional Polynomials, select(), df(), and the Dummy-Variable xi: Workaround

Mayta
Dec 29, 2025
6 min read

Fractional polynomials, selection control, and using dummy variables (xi: workaround)

This article is focused only on mfp (no MI, no validation workflow), and it is written for researchers who want to:

model non-linear continuous predictors in one regression model, and
understand exactly what the key mfp syntaxes and options do, especially
- select()
- df()
- and the dummy-variable / xi: workaround when factor-variable syntax is not accepted.

1) What problem does mfp solve?

In many regression models (logistic, Cox, linear, etc.), continuous predictors are often assumed to have a linear effect on the model scale:

Logistic: linearity in log-odds
Cox: linearity in log-hazard
Linear regression: linearity in mean outcome

That assumption can be wrong. Common “bad fixes” include:

categorizing continuous variables (information loss, arbitrary cutpoints),
guessing quadratic/cubic terms (too ad hoc),
univariable screening (unstable).

mfp provides a structured, parametric, reproducible way to:

test whether a continuous predictor needs a transformation, and
pick a transformation from a restricted family (fractional polynomials),
optionally with backward elimination for predictor selection.

2) What is a fractional polynomial (FP)?

The FP power set

Fractional polynomials use powers from a restricted set, typically:

with the special rule:

(p=0) means log(x)

FP1 vs FP2

FP1 uses one transformed term:

FP2 uses two transformed terms (two powers):

FP is not “make everything non-linear”.FP is: prove linearity is insufficient, and then choose the simplest adequate curve.

3) What mfp does internally (the “loops”)

Think of mfp as doing two tasks:

Task A — Functional form selection (shape)

For each continuous predictor, mfp compares:

Linear vs FP1 vs FP2using deviance / LR-type comparisons, with a controlled testing strategy (closed tests by default; sequential is an alternative).

Task B — Variable selection (optional)

It can also perform backward elimination:

a variable is dropped if removing it does not significantly worsen model fit (based on the select() threshold).

Why it runs in cycles

Because once one variable changes shape or drops, the “best shape” of others can change too.

So mfp iterates (“cycles”) until stable.

4) Basic mfp syntax you should memorize

mfp [, options] : regression_cmd yvar xvarlist [, regression_cmd_options]

Examples of regression_cmd include: logistic, logit, stcox, regress, etc.

Special rule: parentheses in xvarlist

You can write:

x1 x2 (x3 x4 x5)

Variables inside parentheses:

are tested jointly for inclusion/exclusion
are NOT eligible for FP transformation (they remain linear/indicator form)

This is extremely useful for dummy-variable blocks (explained below).

5) The 3 mfp command patterns you said you use frequently

Below are the three “workhorse” patterns, exactly as you use them.

Pattern 1 — Default mfp (shape selection + backward elimination)

mfp, select(0.05) : logistic group_gimalig age hb wbc plt mcv rdw ferritin si male pain wtloss abnbm gib thal cirrhos

What it does

Tests continuous variables for FP shapes (up to default df rules).
Performs selection (removes predictors) using default selection levels (commonly 0.05 unless changed).
Returns a “final” model that may be smaller than the original.

When to use

Exploratory modeling
When you accept automatic selection

Common misinterpretation

This is not a “full model”. It is model selection + shape selection.

Pattern 2 — “Full model” (force all predictors to stay) using select(1)

mfp, select(1) : logistic group_gimalig age hb wbc plt mcv rdw ferritin si male pain wtloss abnbm gib thal cirrhos

What select(1) means

select() is the p-to-remove threshold for backward elimination.
If you set it to 1, then nothing can be removed (because no p-value exceeds 1).

So:

✅ all predictors remain in the model
✅ FP functional-form testing still happens for continuous predictors
❌ no reduction happens (even if predictors are noise)

When to use

When you want shape selection only, but no variable removal

Pattern 3 — Force some predictors, select others + allow more flexibility: select() + df()

mfp, select( ///
    wbc plt mcv rdw ferritin si male pain wtloss abnbm gib thal cirrhos : 0.05, ///
    age hb : 1 ///
) df(age hb:4) : ///
logistic group_gimalig age hb wbc plt mcv rdw ferritin si male pain wtloss abnbm gib thal cirrhos

This is the most “methods-sound” pattern when you want:

clinically essential covariates forced in, and
selection for the rest, and
control over allowed curve complexity for key continuous predictors.

6) Deep dive: select(varlist : p) (what it really does)

The rule

select(varlist : p) sets the variable selection threshold for that varlist.

If p-value for the variable (or variable block) is > p, it can be dropped.
If p-value is ≤ p, it stays.

How does your Pattern 3 works

You wrote:

age hb : 1→ force age and hb in the model (never dropped)
other predictors : 0.05→ use typical backward elimination at 0.05

Key conceptual point

select() controls inclusion/exclusion
It does not control whether the variable is linear or nonlinear(shape is controlled by FP testing and by df() / alpha())

7) Deep dive: df(age hb:4) (what it means in mfp)

In mfp, df() is best understood as:

maximum allowed complexity for FP modeling of that predictor

A practical mapping:

df(var:1) → force linear only (no FP transformation)
df(var:2) → allow FP1
df(var:4) → allow FP2 (the common default maximum)

So:

df(age hb:4)

means:

age and hb are allowed to be modeled as FP2 if the data support it.

Why choose higher df for specific predictors?

Because for some predictors (like age, key biomarkers):

you’re willing to allow curvature,
and you want mfp to test for it properly.

What df() does NOT mean

It does not mean “include 4 polynomial terms” or “make it a 4th-degree polynomial.”It means “allow the FP search to go up to FP2 complexity.”

8) Dummy variables and mfp (the core practical problem)

The limitation

mfp does not accept factor-variable syntax:

i.var
c.var##c.var
time-series operators

So researchers often see:

factor-variable and time-series operators not allowed

The correct strategy

Use either:

pre-created dummy variables (best for clarity + joint testing), or
xi: (quick compatibility).

9) Best practice: pre-create dummy variables + group them as a block

Why pre-created dummies are best for mfp

Because you can:

control the reference category,
preserve missingness correctly,
test the whole categorical predictor jointly with parentheses.

Example: 3-level thal (0=no, 1=trait, 2=disease)

gen thal_trait   = (thal==1) if !missing(thal)
gen thal_disease = (thal==2) if !missing(thal)

Then use them as a block:

mfp, select( (thal_trait thal_disease):0.05, age hb:1 ) : ///
    logistic group_gimalig age hb wbc plt mcv rdw ferritin si male pain wtloss abnbm gib (thal_trait thal_disease) cirrhos

What parentheses give you

(thal_trait thal_disease) is:

selected/dropped together
not FP-transformed (as it shouldn’t be)

This is the cleanest approach for manuscripts.

10) Quick compatibility: xi: mfp ... (how to use it)

If you want to keep writing i.thal i.cirrhos (old-school dummy generation), you can do:

xi: mfp, select(1) : ///
    logistic group_gimalig age hb wbc plt mcv rdw ferritin si male pain wtloss abnbm gib i.thal i.cirrhos

What happens

xi: expands i.thal and i.cirrhos into dummy variables (e.g., Ithal2, Ithal3, etc.)
mfp then sees only plain variables and runs

Limitation of xi: for mfp

xi: does not automatically group the generated dummies as a single block in the mfp sense. So selection can behave oddly (dropping one dummy and keeping another). That’s why manual dummies + parentheses is still preferred if you care about clean, principled selection.

11) How to interpret mfp output (the parts that matter)

In the FP selection table you will see something like:

“Lin. vs FP2” with deviance difference and p-value
“Final” with the chosen power(s)

How to read it

Large p-value for “Lin vs FP2” → linear is adequate
Small p-value → FP model improves fit → a transformation is selected

The generated variables

mfp will create variables like:

Iage__1, Iage__2, etc.

These are:

The transformed versions used in the final model
Typically centered (centering helps numeric stability; it does not change fit)

12) Common “researcher errors” with mfp (avoid these)

Treating select() p-values as “clinical truth” Selection is a modeling choice, not a biological conclusion.
Letting mfp select and then claiming performance is final Shape+selection is data-driven; it usually requires careful reporting and validation later.
Using xi: and expecting perfect categorical handling xi: is a compatibility tool, not a modeling philosophy.
Forgetting to group dummy variables A multi-level categorical predictor should usually be tested as a block.

13) Minimal “copy-ready” templates (logistic)

Template A — Full model, shape selection only

mfp, select(1) : logistic y age hb wbc plt mcv rdw ferritin si male pain wtloss abnbm gib thal_trait thal_disease cir_comp cir_decomp

Template B — Force age+hb; allow selection others; allow FP2 for age+hb

mfp, select( ///
      wbc plt mcv rdw ferritin si male pain wtloss abnbm gib (thal_trait thal_disease) (cir_comp cir_decomp) : 0.05, ///
      age hb : 1 ///
    ) df(age hb:4) : ///
    logistic y age hb wbc plt mcv rdw ferritin si male pain wtloss abnbm gib (thal_trait thal_disease) (cir_comp cir_decomp)

Template C — Quick xi: compatibility

xi: mfp, select(1) : logistic y age hb wbc plt mcv rdw ferritin si male pain wtloss abnbm gib i.thal i.cirrhos

14) How to report mfp in a Methods section (short and correct)

A clean generic sentence:

“Continuous predictors were modeled using multivariable fractional polynomials (Stata mfp) with the default FP power set. Predictor inclusion was handled via backward elimination at a prespecified threshold, while clinically essential variables were forced into the model. Categorical predictors were entered using indicator (dummy) variables.”

If you used select(1) (full model):

“…with select(1) to force all candidate predictors to remain in the model while allowing FP functional-form selection.”

Fractional polynomials, selection control, and using dummy variables (xi: workaround)

1) What problem does mfp solve?

2) What is a fractional polynomial (FP)?

The FP power set

FP1 vs FP2

3) What mfp does internally (the “loops”)

Task A — Functional form selection (shape)

Task B — Variable selection (optional)

Why it runs in cycles

4) Basic mfp syntax you should memorize

Special rule: parentheses in xvarlist

5) The 3 mfp command patterns you said you use frequently

Pattern 1 — Default mfp (shape selection + backward elimination)

What it does

When to use

Common misinterpretation

Pattern 2 — “Full model” (force all predictors to stay) using select(1)

What select(1) means

When to use

Pattern 3 — Force some predictors, select others + allow more flexibility: select() + df()

6) Deep dive: select(varlist : p) (what it really does)

The rule

How does your Pattern 3 works

Key conceptual point

7) Deep dive: df(age hb:4) (what it means in mfp)

Why choose higher df for specific predictors?

What df() does NOT mean

8) Dummy variables and mfp (the core practical problem)

The limitation

The correct strategy

9) Best practice: pre-create dummy variables + group them as a block

Why pre-created dummies are best for mfp

What parentheses give you

10) Quick compatibility: xi: mfp ... (how to use it)

What happens

Limitation of xi: for mfp

11) How to interpret mfp output (the parts that matter)

How to read it

The generated variables

12) Common “researcher errors” with mfp (avoid these)

13) Minimal “copy-ready” templates (logistic)

Template A — Full model, shape selection only

Template B — Force age+hb; allow selection others; allow FP2 for age+hb

Template C — Quick xi: compatibility

14) How to report mfp in a Methods section (short and correct)

Comments