top of page

From Regression to Neural Networks: A Conceptual Bridge for Clinical Researchers

Abstract

Logistic regression and neural networks share a deep mathematical and conceptual structure. Both compute weighted sums of predictors, pass them through activation functions, and produce structured prediction surfaces. What neural networks automate through layers, regression achieves explicitly through polynomial or spline transformations. Understanding this bridge allows clinical researchers to visualize how each term in a regression equation corresponds to a neuron’s operation—and why including both X and X² terms is essential to capture direction and curvature simultaneously.

1. The Shared Foundation: Weighted Summation as a Neural Operation

Every neural network starts with a summation node—a neuron that aggregates weighted inputs and adds a bias term:

For logistic regression, this is identical. The output is then transformed through a logistic activation:

This produces the familiar S-shaped probability curve.

  • In Stata,

    logit CHFS_bincutoff2B c.MREkPa

    creates exactly one neuron: one linear input (MREkPa) passed through a sigmoid activation. The graph is a smooth, monotonic S-curve — constantly increasing or decreasing.

2. Building Curvature: Each Term as a “Node” with Its Own Shape

When you add a quadratic term, the equation becomes:

Each component forms its own subgraph:

Term

Operation

Graph Shape

Neural Analogy

β0

Bias

Horizontal shift

Node bias

β1X

Linear node

Straight line (direction)

First neuron

β2X^2

Quadratic node

Parabolic curve (bend)

Second neuron

Each of these nodes produces its own activation shape before being combined.The logistic function then compresses that combined curve into probabilities (0–1).

Therefore:

logit CHFS_bincutoff2B c.MREkPa c.MREkPa2

Creates a composite decision surface — the logistic activation of a weighted sum of two distinct nonlinear shapes (linear + curved).

That’s why you cannot fit only c.MREkPa2: c.MREkPa2 alone would force the model to build a symmetric U-shaped probability curve, losing the main trend direction. Including both terms allows the output graph to “lean” — rise, bend, then plateau — exactly as seen in real biomarkers.

3. The Graph-Building Logic: From Node to Outcome

Let’s visualize the operation conceptually:

Step 1. Compute individual node contributions

1️⃣ Linear node (X)

→ Graph: Straight increasing or decreasing line.

2️⃣ Quadratic node (X²)

→ Graph: U- or inverted-U shape (curved).

Step 2. Combine into a single pre-activation layer


This “summation graph” is the raw decision surface—often a smooth hump or sigmoid-like curve depending on β’s signs.

Step 3. Apply the activation (logistic link)

Now the curve becomes a bounded, clinically interpretable probability — capturing both the general trend (from X) and the curvature (from X²).

This is precisely what a shallow neural network does: Each neuron creates a shape, then the activation compresses and fuses them into a pattern that matches the observed data.

4. Polynomial Regression as “Manual Feature Learning”

Modeling Level

Regression Equation

Neural Layer Analogy

Output Pattern

Linear

β₀ + β₁X

One neuron

Monotonic (↑ or ↓)

Quadratic

β₀ + β₁X + β₂X²

Two neurons (one linear, one curved)

Sigmoid with bend or plateau

Spline

β₀ + Σ β_k f_k(X)

Multi-node hidden layer

Smooth flexible curve

Deep NN

Learned nonlinear functions

Multiple hidden layers

Complex, multi-peak surface

This “manual feature learning” in regression explicitly mirrors the automated hidden layer learning in neural networks. The difference is transparency: regression tells you exactly which shape each feature contributes.

5. Why Including Both X and X² Is Clinically and Mathematically Correct

Scenario

Model

Result

Interpretation

Only X²

logit Y c.X2

Symmetrical U-shape centered near 0

No directionality — biologically implausible

X + X²

logit Y c.X c.X2

Asymmetric curve with slope and bend

Captures both baseline trend and saturation

The linear term defines direction (does risk rise or fall?), the quadratic term defines curvature (does it plateau or bend?). Together, they form the biologically realistic S-shaped or saturating response seen in continuous biomarkers (MRE, AST, ALT, FIB-4).

6. Clinical Illustration: Fibrosis Probability by MREkPa

Term

Interpretation

β₁ (linear)

Overall direction — higher MRE increases fibrosis probability

β₂ (quadratic)

Adjustment for curvature — captures flattening or downturn

Combined

Clinically realistic sigmoid-type relationship: rises fast at first, then levels off

Graphically: Each term draws its own subcurve. Their combination forms a “master curve.”The logistic transformation then compresses it to 0–1, producing the final probability graph familiar to clinicians.

Hence, the logit model’s geometry is neural-like: a sum of shapes transformed into a bounded outcome.

7. If Y Is Continuous — The Same Logic Applies

For continuous outcomes, the neural analogy still holds, but the activation is identity (no sigmoid compression).

Feature

Linear Regression

Logistic Regression

Neural Network Equivalent

Input layer

Predictors (X)

Predictors (X)

Inputs

Weights

β₁, β₂, …

β₁, β₂, …

Learned weights

Bias

β₀

β₀

Bias node

Activation

Identity

Sigmoid (logit)

Nonlinear activation

Output

Continuous

Probability (0–1)

Output neuron

So whether you run:

regress ALT c.MREkPa c.MREkPa2

or

logit CHFS_bincutoff2B c.MREkPa c.MREkPa2

You are performing the same neural operation — summing nodes and shaping outputs — only differing by the activation applied to Y.

8. Summary: Seeing the Neural Pattern Inside Every Regression

Concept

Regression Term

Neural Analogy

Graph Effect

Intercept (β₀)

Bias

Bias node

Shifts curve vertically

Linear term (β₁X)

Weighted input

Node 1

Sets direction

Quadratic term (β₂X²)

Nonlinear input

Node 2

Creates curvature

Logistic link

Sigmoid activation

Output neuron

Compresses to 0–1

Combined output

Predicted p(Y=1)

Neural output

Clinical probability



Key Takeaway In regression, each term is an operation that creates a graph. The model combines these shapes systematically, applies a link function, and produces a final patterned output — exactly as a neural network does through its layered structure. That is why the correct model is: logit CHFS_bincutoff2B c.MREkPa c.MREkPa2 not c.MREkPa2 alone — because every neural-like model must preserve both direction and curvature to form a coherent and interpretable clinical pattern.

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Post: Blog2_Post

​Message for International and Thai Readers Understanding My Medical Context in Thailand

Message for International and Thai Readers Understanding My Broader Content Beyond Medicine

bottom of page