SDI: Structural Divergence Index
Predicting fine-tuning degradation from model structure, not benchmarks.
SDI is a composite metric that quantifies geometric and spectral shifts between a base model and its fine-tuned variant. It predicts performance degradation without running full benchmark suites, requiring only lightweight probe inference. The goal: reduce model validation from hours of GPU benchmarking to minutes of structural analysis.
<10min
Scan Time (7B Models)
4
Structural Signals
5
Model Families
The Problem
Fine-tuning foundation models introduces unpredictable behavioral regressions. A model fine-tuned for medical question answering might lose its ability to follow instructions. A model adapted for code generation might start hallucinating more. Organizations discover these regressions only after running expensive benchmark suites or, worse, after deployment.
A single benchmark pass on a 7B parameter model takes hours of GPU time. Organizations running dozens of fine-tunes per week cannot evaluate every candidate. The result: degraded models reach production.
How SDI Works
Spectral Divergence
SVD of each layer's weight matrix before and after fine-tuning. Measures structural deformation of the learned transformation, not just magnitude of change.
Representation Drift (CKA)
Centered Kernel Alignment between base and fine-tuned model activations on a fixed 1,000-sample probe set. Detects whether internal representations have shifted.
Curvature Shift
Hessian trace estimates via Hutchinson's stochastic estimator at both checkpoints. Detects sharp-to-flat transitions that correlate with generalization changes.
Weight Geometry
Per-layer L2 distance normalized by layer size, weighted by layer depth. Captures raw magnitude of parameter shift across the network.
Scientific Foundation
SDI combines four independently replicated research results. No novel theoretical claims required. The innovation is engineering known signals into a validated predictive governance tool.
- Martin & Mahoney (2021). Implicit self-regularization in deep neural networks. JMLR.
- Kornblith et al. (2019). Similarity of neural network representations revisited. ICML.
- Keskar et al. (2017). On large-batch training for deep learning: generalization gap and sharp minima. ICLR.
- Aghajanyan et al. (2021). Intrinsic dimensionality explains the effectiveness of language model fine-tuning. ACL.
Phase I Plan (NSF SBIR, $305K, 9 Months)
Objective 1: Define and Formalize SDI
Mathematical specification and reproducible computation pipeline. Open-source implementation. Under 10 minutes for 7B models.
Objective 2: Fine-Tune Regression Dataset
50+ base-to-fine-tune pairs across 5 model families (Llama, Mistral, Phi, Gemma, Qwen). Domain, instruction, and deliberately degraded fine-tunes. Full benchmark evaluation on both endpoints.
Objective 3: Validate Predictive Correlation
Spearman rho ≥ 0.7 between SDI and degradation magnitude across MMLU, IFEval, ToxiGen, and TruthfulQA. False negative rate < 15% for high-regression cases.
