From Regression to Neural Networks: A Conceptual Bridge for Clinical Researchers

Mayta
Oct 10, 2025
4 min read

Abstract

Logistic regression and neural networks share a deep mathematical and conceptual structure. Both compute weighted sums of predictors, pass them through activation functions, and produce structured prediction surfaces. What neural networks automate through layers, regression achieves explicitly through polynomial or spline transformations. Understanding this bridge allows clinical researchers to visualize how each term in a regression equation corresponds to a neuron’s operation—and why including both X and X² terms is essential to capture direction and curvature simultaneously.

1. The Shared Foundation: Weighted Summation as a Neural Operation

Every neural network starts with a summation node—a neuron that aggregates weighted inputs and adds a bias term:

For logistic regression, this is identical. The output is then transformed through a logistic activation:

This produces the familiar S-shaped probability curve.

In Stata,
logit CHFS_bincutoff2B c.MREkPa
creates exactly one neuron: one linear input (MREkPa) passed through a sigmoid activation. The graph is a smooth, monotonic S-curve — constantly increasing or decreasing.

2. Building Curvature: Each Term as a “Node” with Its Own Shape

When you add a quadratic term, the equation becomes:

Each component forms its own subgraph:

Term	Operation	Graph Shape	Neural Analogy
β0	Bias	Horizontal shift	Node bias
β1X	Linear node	Straight line (direction)	First neuron
β2X^2	Quadratic node	Parabolic curve (bend)	Second neuron

Each of these nodes produces its own activation shape before being combined.The logistic function then compresses that combined curve into probabilities (0–1).

Therefore:

logit CHFS_bincutoff2B c.MREkPa c.MREkPa2

Creates a composite decision surface — the logistic activation of a weighted sum of two distinct nonlinear shapes (linear + curved).

That’s why you cannot fit only c.MREkPa2: c.MREkPa2 alone would force the model to build a symmetric U-shaped probability curve, losing the main trend direction. Including both terms allows the output graph to “lean” — rise, bend, then plateau — exactly as seen in real biomarkers.

3. The Graph-Building Logic: From Node to Outcome

Let’s visualize the operation conceptually:

Step 1. Compute individual node contributions

1️⃣ Linear node (X)

→ Graph: Straight increasing or decreasing line.

2️⃣ Quadratic node (X²)

→ Graph: U- or inverted-U shape (curved).

Step 2. Combine into a single pre-activation layer

This “summation graph” is the raw decision surface—often a smooth hump or sigmoid-like curve depending on β’s signs.

Step 3. Apply the activation (logistic link)

Now the curve becomes a bounded, clinically interpretable probability — capturing both the general trend (from X) and the curvature (from X²).

This is precisely what a shallow neural network does: Each neuron creates a shape, then the activation compresses and fuses them into a pattern that matches the observed data.

4. Polynomial Regression as “Manual Feature Learning”

Modeling Level	Regression Equation	Neural Layer Analogy	Output Pattern
Linear	β₀ + β₁X	One neuron	Monotonic (↑ or ↓)
Quadratic	β₀ + β₁X + β₂X²	Two neurons (one linear, one curved)	Sigmoid with bend or plateau
Spline	β₀ + Σ β_k f_k(X)	Multi-node hidden layer	Smooth flexible curve
Deep NN	Learned nonlinear functions	Multiple hidden layers	Complex, multi-peak surface

This “manual feature learning” in regression explicitly mirrors the automated hidden layer learning in neural networks. The difference is transparency: regression tells you exactly which shape each feature contributes.

5. Why Including Both X and X² Is Clinically and Mathematically Correct

Scenario	Model	Result	Interpretation
Only X²	logit Y c.X2	Symmetrical U-shape centered near 0	No directionality — biologically implausible
X + X²	logit Y c.X c.X2	Asymmetric curve with slope and bend	Captures both baseline trend and saturation

The linear term defines direction (does risk rise or fall?), the quadratic term defines curvature (does it plateau or bend?). Together, they form the biologically realistic S-shaped or saturating response seen in continuous biomarkers (MRE, AST, ALT, FIB-4).

6. Clinical Illustration: Fibrosis Probability by MREkPa

Term	Interpretation
β₁ (linear)	Overall direction — higher MRE increases fibrosis probability
β₂ (quadratic)	Adjustment for curvature — captures flattening or downturn
Combined	Clinically realistic sigmoid-type relationship: rises fast at first, then levels off

Graphically: Each term draws its own subcurve. Their combination forms a “master curve.”The logistic transformation then compresses it to 0–1, producing the final probability graph familiar to clinicians.

Hence, the logit model’s geometry is neural-like: a sum of shapes transformed into a bounded outcome.

7. If Y Is Continuous — The Same Logic Applies

For continuous outcomes, the neural analogy still holds, but the activation is identity (no sigmoid compression).

Feature	Linear Regression	Logistic Regression	Neural Network Equivalent
Input layer	Predictors (X)	Predictors (X)	Inputs
Weights	β₁, β₂, …	β₁, β₂, …	Learned weights
Bias	β₀	β₀	Bias node
Activation	Identity	Sigmoid (logit)	Nonlinear activation
Output	Continuous	Probability (0–1)	Output neuron

So whether you run:

regress ALT c.MREkPa c.MREkPa2

logit CHFS_bincutoff2B c.MREkPa c.MREkPa2

You are performing the same neural operation — summing nodes and shaping outputs — only differing by the activation applied to Y.

8. Summary: Seeing the Neural Pattern Inside Every Regression

Concept	Regression Term	Neural Analogy	Graph Effect
Intercept (β₀)	Bias	Bias node	Shifts curve vertically
Linear term (β₁X)	Weighted input	Node 1	Sets direction
Quadratic term (β₂X²)	Nonlinear input	Node 2	Creates curvature
Logistic link	Sigmoid activation	Output neuron	Compresses to 0–1
Combined output	Predicted p(Y=1)	Neural output	Clinical probability

Key Takeaway In regression, each term is an operation that creates a graph. The model combines these shapes systematically, applies a link function, and produces a final patterned output — exactly as a neural network does through its layered structure. That is why the correct model is: logit CHFS_bincutoff2B c.MREkPa c.MREkPa2 not c.MREkPa2 alone — because every neural-like model must preserve both direction and curvature to form a coherent and interpretable clinical pattern.