Degrees of Freedom in Fractional Polynomial Modeling (FP/MFP): What df(1), df(2), and df(4) Really Mean
- Mayta

- Dec 28, 2025
- 6 min read
A clinical-epidemiology article on what “df(1), df(2), df(4)” really mean (and what they do not mean)
Abstract
Fractional polynomials (FP) are a structured approach for modeling non-linear associations between continuous predictors (e.g., age, hemoglobin, creatinine) and outcomes without categorizing variables or using unstable high-degree polynomials. In FP and Stata’s multivariable fractional polynomial (MFP) workflow, the degrees of freedom settings—linear (1 df), FP1 (2 df), and FP2 (4 df)—are often misinterpreted as “polynomial degree” from high-school algebra (degree 1, 2, 3, 4). They are not the same. In MFP, df reflects allowed shape complexity and the penalty used during model selection, not simply “highest power of x.” This article explains the meaning of df in FP/MFP, clarifies the difference from standard polynomial degree, shows how FP1 and FP2 are built, and gives practical guidance for choosing and reporting df in clinical research.
1) Why degrees of freedom matter in clinical models
Clinical predictors rarely have perfectly linear effects. Age may increase risk quickly at first and then plateau; hemoglobin may show a threshold-like pattern; creatinine may have a steep risk gradient at lower values and flatten later. If you assume a straight line when the truth is curved, you risk:
biased effect estimates (wrong slope)
misleading clinical interpretation
poor prediction performance and calibration
spurious “thresholds” when you categorize
Yet if you allow too much flexibility, you risk overfitting—a curve that looks impressive in your dataset but fails in new patients.
Degrees of freedom are the “control knob” that manages this trade-off.
2) The key confusion: df in FP is not “polynomial degree”
What high-school “degree” means
In standard polynomials, “degree” means the highest exponent of x that you include:
Degree 1 (linear): y = c + b1·x
Degree 2 (quadratic): y = c + b1·x + b2·x²
Degree 3 (cubic): y = c + b1·x + b2·x² + b3·x³
Here, you force x², x³, etc. into the model. That’s algebraic degree.
What “df” means in FP/MFP
In FP/MFP, df means how much flexibility you allow to represent the shape of the predictor–outcome relationship and how strongly the procedure penalizes complexity during selection.
So when Stata MFP uses:
linear (1 df)
FP1 (2 df)
FP2 (4 df)
it is not saying “use x¹, x², x³, x⁴.”
It is saying: “Allow the model to spend a bigger flexibility budget if the data justify it.”
3) What fractional polynomials actually do (the FP idea)
Fractional polynomials do not use “degree 4 polynomials.”They use a restricted menu of powers applied to x (chosen from a small set), commonly including:
negative powers (example: x⁻¹, x⁻²)
fractional powers (example: x⁰⋅⁵ = √x)
log transformation (special case: power 0 means ln(x))
familiar powers (x¹, x², x³)
This restricted set is intentional: it provides flexibility while avoiding wild oscillations and extreme behavior that classic high-degree polynomials can produce.
4) The FP functional forms (no “beta to the power” mistake)
A frequent misunderstanding is writing something like:“FP1 = slope × beta^power.”
That is not FP.
FP uses: beta × transformed x
Linear (1 df)
Form: y = c + b1·x
One straight-line effect for x.
FP1 (2 df in MFP)
Form: y = c + b1·(x^p)
p is chosen from the allowed power menu
b1 is estimated from the data
Interpretation: one smooth curve (one “bend”) is allowed.
FP2 (4 df in MFP)
Form: y = c + b1·(x^p1) + b2·(x^p2)
two powers are selected
two coefficients are estimated
Interpretation: more complex curves (two “bends” or controlled wiggles) are allowed.
If the two selected powers are the same (p1 = p2), FP2 uses a special repeated-power construction internally (still two terms, still more flexible than FP1).
5) Why does MFP call FP1 “2 df” and FP2 “4 df”?
This is the most important conceptual point:
Final model parameters vs “selection df”
If you look only at the final fitted equation:
FP1 ends up with one coefficient for that predictor (b1).
FP2 ends up with two coefficients (b1 and b2).
So you might think FP1 “should be 1 df” and FP2 “should be 2 df.”But MFP is not only fitting—it is selecting the function from multiple candidates (multiple powers). That selection step adds effective complexity.
MFP’s df labels are best understood as “effective df used for flexibility and penalty in selection,” not a simple count of coefficients.
That is why MFP uses the conventional mapping:
1 df → linear allowed
2 df → FP1 allowed
4 df → FP2 allowed
It’s a practical system that reflects:“More candidate shapes considered → bigger complexity allowance → stronger penalty needed.”
6) The “wire bending” interpretation of df (clinical intuition)
Think of the fitted relationship as a wire you are shaping through the data:
1 df (linear): stiff rod; can tilt but cannot bend
2 df (FP1): one smooth bend; simple curvature
4 df (FP2): more bending; can capture S-like patterns or multi-phase curvature
More df = more ability to bend = more risk of “chasing noise.”
7) The penalty principle: why higher df must “earn it”
MFP does not automatically choose the most flexible shape. It asks:
Does a more complex curve improve fit enough to justify the extra flexibility?
This is why allowing df(4) does not force FP2. It simply gives the model permission to try FP2 and then reject it if the gain is too small.
Clinical translation:“Complexity is taxed.”To adopt FP2, the model must show a convincing improvement over FP1/linear.
8) How FP df differs from your example: “y = c + x + x² + x³”
Your high-school polynomial example is:
y = c + (term 1) x¹ + (term 2) x² + (term 3) x³
That is a forced polynomial basis with degrees 1–3.
FP is different:
it does not automatically include x, x², x³ together
it chooses one or two transformed terms from a restricted menu
it is designed to behave more sensibly at extremes and avoid unstable oscillation
So “df(4)” in FP does not mean “add x⁴.”It means “allow a second FP term and the broader family of shapes that come with it.”
9) Practical guidance: how to choose df in clinical research
When df(1) is reasonable
very small datasets
few events (logistic/Cox)
strong desire for simple interpretation
exposure range is narrow and linearity is plausible
When df(2) is a strong default
moderate sample size
you suspect curvature but want protection against overfit
you want a curve that is still interpretable
When df(4) is justified
large sample size and/or many events
strong biological reason for multi-phase patterns
you will validate internally (bootstrap/CV) or externally
you need high-fidelity risk modeling (especially prediction work)
A practical warning sign: if you allow many variables to have df(4) in a modest dataset, you can unintentionally build a very flexible model that looks “excellent” but generalizes poorly.
10) Reporting template (ready to paste, journal-style)
“Continuous predictors were examined for non-linear associations using multivariable fractional polynomials. For each continuous covariate, linear, first-order fractional polynomial, and second-order fractional polynomial functional forms were considered (corresponding to 1, 2, and 4 degrees of freedom, respectively). Selection of functional form followed a structured procedure that penalizes additional degrees of freedom to reduce overfitting. The final functional form for each predictor was retained only when it provided meaningful improvement in model fit.”
11) Common mistakes (and the correct fixes)
Mistake 1: “df(4) means x⁴.”
Fix: df(4) in FP means “allow FP2,” not “force a 4th-degree polynomial.”
Mistake 2: “FP1 = beta^power.”
Fix: FP uses beta × (x^p). The power applies to x, not beta.
Mistake 3: Applying log/negative powers to variables that include 0 or negatives.
Fix: FP transformations like ln(x) or x⁻¹ require x to be positive. If a variable can be ≤0, consider clinical recoding, shifting (with justification), or alternative approaches.
Conclusion
In FP/MFP, degrees of freedom are best understood as a controlled flexibility budget for shaping the predictor–outcome relationship. Linear (1 df) is rigid, FP1 (2 df) allows simple curvature, and FP2 (4 df) allows richer curvature—while imposing a stronger penalty to prevent overfitting. Crucially, FP df are not the same as polynomial degree from basic algebra. FP does not mean “use x¹ + x² + x³ + x⁴.” It means “choose one or two transformed terms from a restricted menu to capture clinically plausible non-linearity with discipline.”
If you paste your actual Stata mfp output table (the selected powers for each covariate), I’ll rewrite a publication-ready Results paragraph describing the chosen functional forms in plain clinical language.






Comments