R&DNSF SBIR Phase I

SDI: Structural Divergence Index

Predicting fine-tuning degradation from model structure, not benchmarks.

SDI is a composite metric that quantifies geometric and spectral shifts between a base model and its fine-tuned variant. It predicts performance degradation without running full benchmark suites, requiring only lightweight probe inference. The goal: reduce model validation from hours of GPU benchmarking to minutes of structural analysis.

<10min

Scan Time (7B Models)

Structural Signals

Model Families

The Problem

Fine-tuning foundation models introduces unpredictable behavioral regressions. A model fine-tuned for medical question answering might lose its ability to follow instructions. A model adapted for code generation might start hallucinating more. Organizations discover these regressions only after running expensive benchmark suites or, worse, after deployment.

A single benchmark pass on a 7B parameter model takes hours of GPU time. Organizations running dozens of fine-tunes per week cannot evaluate every candidate. The result: degraded models reach production.

How SDI Works

Spectral Divergence

SVD of each layer's weight matrix before and after fine-tuning. Measures structural deformation of the learned transformation, not just magnitude of change.

Representation Drift (CKA)

Centered Kernel Alignment between base and fine-tuned model activations on a fixed 1,000-sample probe set. Detects whether internal representations have shifted.

Curvature Shift

Hessian trace estimates via Hutchinson's stochastic estimator at both checkpoints. Detects sharp-to-flat transitions that correlate with generalization changes.

Weight Geometry

Per-layer L2 distance normalized by layer size, weighted by layer depth. Captures raw magnitude of parameter shift across the network.

Scientific Foundation

SDI combines four independently replicated research results. No novel theoretical claims required. The innovation is engineering known signals into a validated predictive governance tool.

Martin & Mahoney (2021). Implicit self-regularization in deep neural networks. JMLR.
Kornblith et al. (2019). Similarity of neural network representations revisited. ICML.
Keskar et al. (2017). On large-batch training for deep learning: generalization gap and sharp minima. ICLR.
Aghajanyan et al. (2021). Intrinsic dimensionality explains the effectiveness of language model fine-tuning. ACL.

Phase I Plan (NSF SBIR, $305K, 9 Months)

Objective 1: Define and Formalize SDI

Mathematical specification and reproducible computation pipeline. Open-source implementation. Under 10 minutes for 7B models.

Objective 2: Fine-Tune Regression Dataset

50+ base-to-fine-tune pairs across 5 model families (Llama, Mistral, Phi, Gemma, Qwen). Domain, instruction, and deliberately degraded fine-tunes. Full benchmark evaluation on both endpoints.

Objective 3: Validate Predictive Correlation

Spearman rho ≥ 0.7 between SDI and degradation magnitude across MMLU, IFEval, ToxiGen, and TruthfulQA. False negative rate < 15% for high-regression cases.

GitHub Repository Contact Us

Back to Projects