Hybrid Gaussian–ReLU Neural Networks (Switch–Scale Architecture)

Follow-up Concept: Hybrid Gaussian–ReLU Neural Networks

Motivation

Pure Gaussian-smoothing networks provide strong inductive bias for smooth numerical approximation, but struggle with sharp regime changes, thresholds, and piecewise structure.
Conversely, ReLU-based networks handle switching behavior well but are sample-inefficient and unstable for smooth regression.

Many real numerical systems are mostly smooth except when they are not.

This follow-up explores a hybrid architecture that explicitly combines:

Gaussian pathways for scale, smoothness, and numerical stability
ReLU pathways for switching, sparsity, and regime selection

The goal is to avoid forcing one mechanism to solve both problems.

Core Idea

Each layer contains parallel feature paths:

A Gaussian path that performs linear aggregation followed by controlled smoothing
A ReLU path that performs standard affine + ReLU transformations

The two paths are then mixed or gated, allowing the network to represent both smooth structure and sharp transitions.

Architectural Sketch

Let (h_\ell) denote the layer input.

Parallel Update

Gaussian path: [ h^{(G)}{\ell+1} = \text{GaussianBlock}(h\ell) ]
ReLU path: [ h^{(R)}{\ell+1} = \text{ReLUBlock}(h\ell) ]

Combination Strategies

Linear Mixing [ h_{\ell+1} = A_G h^{(G)}{\ell+1} + A_R h^{(R)}{\ell+1} ]

Gated Mixing

ReLU gates Gaussian: [ h_{\ell+1} = g(h^{(R)}{\ell+1}) \odot h^{(G)}{\ell+1} ]
Gaussian stabilizes ReLU: [ h_{\ell+1} = h^{(R)}{\ell+1} + \text{Smooth}(h^{(R)}{\ell+1}) ]

The gated variants explicitly encode regime selection + local approximation.

Loss Functions and Training Objectives

Motivation for Specialized Losses

When data contain multiple regimes or sharp transitions, standard MSE encourages mode averaging, which is numerically incorrect.

Hybrid architectures benefit from objectives that:

allow specialization,
tolerate occasional regime misassignment,
discourage collapse to a single pathway.

Candidate Loss Designs

1. Mixture-of-Experts Regression Loss

Two predictors with a learned gate: [ \mathcal L = -\log\left( \pi(x)\,p(y\mid \hat y_G)

(1-\pi(x))\,p(y\mid \hat y_R) \right) ]

Encourages:

Gaussian path to model smooth core behavior
ReLU path to model regime-specific corrections or alternatives

2. Robust Regression Losses

Huber or pseudo-Huber losses reduce sensitivity to transient regime errors while preserving quadratic behavior near the optimum.

Useful when:

ReLU path introduces sharp but sparse deviations
numerical stability matters more than exact boundary fitting

3. Multi-Head Objectives

Separate objectives for:

regime detection (classification or gating loss)
magnitude prediction (regression loss)

Prevents a single scalar output from carrying incompatible responsibilities.

Optimization Considerations

Key risks:

ReLU path dominating due to expressivity
gate collapse in mixture formulations
under-utilization of the Gaussian pathway

Mitigations:

entropy or load-balancing regularization on gates (early training)
staged training (Gaussian path pretraining)
different learning rates for gate vs predictors
explicit smoothness regularization on ReLU outputs

Expected Benefits

Improved sample efficiency on smooth numerical tasks
Better handling of regime changes without oversmoothing
More stable derivatives and extrapolation behavior
Cleaner separation between “where” and “how” behavior changes

When This Architecture Should Help

Piecewise-smooth regression problems
Scientific and physical models with thresholds or phase changes
PDE surrogate modeling with boundary effects
Numerical reasoning tasks with conditional structure

Conceptual Summary

Gaussian paths encode how values vary.
ReLU paths encode when behavior changes.

A hybrid model allows each to do what it is naturally good at, instead of forcing a single activation mechanism to approximate incompatible structures.

Status

Exploratory follow-up.
Intended as a controlled extension of Gaussian smoothing networks, with clear ablation paths and numerical benchmarks.

Edit this page on GitHub