[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"blog-post-formula-complexity-vs-performance":3},{"id":4,"title":5,"slug":6,"excerpt":7,"category":8,"tags":9,"author_name":14,"cover_image":15,"status":16,"view_count":17,"reading_time_minutes":18,"published_at":19,"updated_at":19,"created_at":20,"content":21,"meta_description":22,"og_image":15,"canonical_url":23,"author_uid":23,"previous_slugs":24,"images":25},"69c95b46b422d9f69ccff185","Formula Complexity vs. Model Performance: Do More Complex Equations Actually Train Better?","formula-complexity-vs-performance","We trained 6 progressively complex formulas on the same dataset. The results surprised us.","research",[10,11,12,13],"experiment","benchmark","complexity","model-selection","Kingsley Michael","https:\u002F\u002Fmathexec.com\u002Fblog\u002Fimages\u002F69c95b46b422d9f69ccff185\u002F6cf3b8c7-78c9-418d-aa8c-87248389d6a3.png","published",16,8,"2026-04-01T15:27:37.410000","2026-03-29T17:02:52.547000","# Formula Complexity vs. Model Performance: Do More Complex Equations Actually Train Better?\n\nThere's a default assumption in ML that more complex models perform better. More layers, more parameters, more capacity to learn. And in the deep learning era, that intuition is often correct for large datasets. But for the tabular datasets that most practitioners work with day-to-day (hundreds to tens of thousands of rows, a handful of features), the relationship between formula complexity and model performance is less clear.\n\nWe ran an experiment. Same dataset, 6 formulas of increasing complexity, same training configuration. Here's what happened.\n\n## Experimental setup\n\n**Dataset**: UCI Heart Disease dataset. 303 rows, 13 features, binary target (presence of heart disease). A standard benchmark that's small enough to train fast but real enough to produce meaningful results.\n\n**Formulas** (ordered by complexity):\n\n| Level | Formula | Params (approx) |\n|:---:|---------|:---:|\n| 1 | `y = σ(Wx + b)` | 14 |\n| 2 | `y = σ(W₂ · ReLU(W₁x + b₁) + b₂)` | 961 |\n| 3 | `y = σ(W₃ · ReLU(W₂ · ReLU(W₁x + b₁) + b₂) + b₃)` | 5,185 |\n| 4 | `y = σ(W₄ · ReLU(W₃ · ReLU(W₂ · ReLU(W₁x + b₁) + b₂) + b₃) + b₄)` | 9,409 |\n| 5 | Same as 4, with hidden_dim=128 | ~34,000 |\n| 6 | Same as 4, with hidden_dim=256 | ~134,000 |\n\nLevels 1-4 increase depth (more layers). Levels 4-6 increase width (more parameters per layer) at the same depth.\n\n**Training configuration**: Adam optimizer, learning rate 0.001, batch size 32, 200 epochs, 80\u002F20 train\u002Fvalidation split. Same random seed across all runs. Features standardized to zero mean, unit variance.\n\nAll runs were done in MathExec with default settings except where noted.\n\n## Results\n\n| Level | Formula complexity | Train acc | Val acc | Gap | Training time | Epochs to converge |\n|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n| 1 | 1 layer, 14 params | 84.3% | 83.6% | 0.7% | 1.2s | ~40 |\n| 2 | 2 layers, 961 params | 88.7% | 86.9% | 1.8% | 2.1s | ~60 |\n| 3 | 3 layers, 5K params | 91.2% | 85.2% | 6.0% | 3.4s | ~90 |\n| 4 | 4 layers, 9K params | 94.8% | 83.6% | 11.2% | 4.8s | ~120 |\n| 5 | 4 layers, 34K params | 97.3% | 82.0% | 15.3% | 6.2s | ~150 |\n| 6 | 4 layers, 134K params | 99.1% | 80.3% | 18.8% | 9.1s | ~180 |\n\n## What the data shows\n\n### The sweet spot is shallower than you'd think\n\nLevel 2 (a simple 2-layer MLP with ~960 parameters) achieved the best validation accuracy at 86.9%. Adding a third layer (Level 3) improved training accuracy but *reduced* validation accuracy. By Level 6, the model was essentially memorizing the training data (99.1% train, 80.3% validation).\n\nThis is textbook overfitting, but seeing it happen across a clean progression of formula complexity makes the dynamic very visible. The train-validation gap grows monotonically with complexity: 0.7% → 1.8% → 6.0% → 11.2% → 15.3% → 18.8%.\n\n### Depth hurts before width does\n\nComparing Levels 2-4 (same width, increasing depth) against Level 4-6 (same depth, increasing width), depth caused faster overfitting on this dataset. Going from 2 to 4 layers increased the gap by 9.4 percentage points. Going from 9K to 134K parameters at constant depth increased it by 7.6 points.\n\nThis makes intuitive sense for small tabular datasets: each layer adds the ability to learn more abstract features, but with only 303 rows and 13 features, there aren't many abstract features to learn. Extra layers just learn noise.\n\n### Training time scales linearly with complexity\n\nTraining time went from 1.2 seconds (Level 1) to 9.1 seconds (Level 6). All of these are fast enough to be interactive. You'd have to go much larger (millions of parameters, thousands of rows) before training time becomes a bottleneck on tabular data.\n\n### Convergence takes longer for complex models\n\nThe simplest model converged in ~40 epochs. The most complex needed ~180. This matters more than wall-clock time because it affects your feedback loop: if you're watching the loss curve and waiting for it to stabilize, a 180-epoch run feels longer than a 40-epoch one even if both finish in seconds.\n\n## What this means for formula selection\n\n### Start simple, escalate only with evidence\n\nThe Level 1 formula (`y = σ(Wx + b)`) is 14 characters of LaTeX and gets you to 83.6% validation accuracy. The Level 2 formula (`y = σ(W₂ · ReLU(W₁x + b₁) + b₂)`) is 28 characters and gets you to 86.9%. That 3.3 percentage point improvement is worth the extra complexity.\n\nBut going from Level 2 to Level 3 costs 6 more characters of LaTeX and *loses* 1.7 points of validation accuracy. The complexity isn't paying for itself anymore.\n\nThe practical rule: add complexity one level at a time. If validation accuracy doesn't improve, stop. Don't add layers hoping that more will eventually help. On tabular data, it usually doesn't.\n\n### The dataset size threshold\n\nThis experiment used 303 rows. Results would look different on 30,000 rows or 300,000 rows, where deeper models have enough data to learn meaningful abstractions without memorizing. The smaller your dataset, the more aggressive the overfitting at higher complexity levels.\n\nA rough heuristic from our testing: for tabular datasets, the number of parameters in your model should be no more than 10-20% of the number of training samples. Level 2 (961 params, 242 training samples) is already pushing this boundary. Level 6 (134K params) exceeds it by 500x.\n\n### Formula complexity is not the only lever\n\nOn this dataset, feature engineering (creating interaction terms, normalizing differently, handling categorical features) would likely improve Level 2's performance more than switching to Level 4. The Data Studio's NL transforms let you try things like \"create interaction terms between age and cholesterol\" in a few seconds.\n\nWhen your model is already at 86.9% and the extra complexity is just memorizing noise, the signal is in the data, not in the architecture.\n\n## Reproducing this experiment\n\nAll of this can be replicated in MathExec:\n\n1. Upload the UCI Heart Disease dataset (available as a standard CSV from the UCI ML Repository)\n2. Type each formula, train, and check the experiment results\n3. Compare all 6 experiments in the Experiments tab using the loss curve overlay\n\nThe training is deterministic with a fixed seed, so you should get similar (though not identical, due to hardware differences) results.\n\n## Raw data\n\nWe're publishing the full training logs (per-epoch loss and accuracy for all 6 runs) alongside this post. If you have a different dataset you'd like to see this experiment repeated on, let us know.\n\n## Caveats and extensions\n\n### This is one dataset\n\nThe Heart Disease dataset has 303 rows and 13 features. On ImageNet (1.2 million images, 1000 classes), the relationship between complexity and performance looks very different: deeper models genuinely perform better because there are enough samples to support the additional capacity.\n\nThe takeaway isn't \"simple models are always better.\" It's \"simple models are often sufficient for small tabular datasets.\" The dataset size determines where the complexity sweet spot falls.\n\n### Regularization changes the picture\n\nWe ran these experiments without explicit regularization (no dropout, no weight decay). Adding dropout to the deeper models would reduce overfitting and potentially shift the sweet spot toward higher complexity. We chose not to include regularization to isolate the effect of formula complexity alone, but in practice, regularization is another knob you should turn.\n\n### The experiment is trivially reproducible\n\nEvery data point in this post can be reproduced in MathExec in about 5 minutes:\n\n1. Upload the Heart Disease CSV\n2. Type each formula, one at a time\n3. Train with default settings\n4. Open the Experiments tab and compare\n\nIf you run this experiment on a different dataset (more rows, more features, different task), we'd love to see your results. The question \"how complex should my formula be?\" depends on the data, and more data points make the answer more useful.\n\n### Practical recommendation\n\nFor tabular datasets under 10,000 rows: start with Level 2 (). If it plateaus below your target accuracy, try Level 3. If Level 3 doesn't help, the answer is better features, not a deeper model.\n\nFor larger datasets (100K+ rows), you have more room to explore Level 3-4. But even then, check whether the complexity is buying you real validation improvement or just training set memorization.\n\n## Why we published this data\n\nMost advice about model complexity is qualitative: \"don't overfit,\" \"start simple,\" \"use regularization.\" This experiment puts numbers behind those platitudes. A 2-layer MLP with 961 parameters outperforms a 4-layer MLP with 134,000 parameters on this dataset. That's a 140x difference in model size for worse results.\n\nWe think more experiments like this, on more datasets, would help practitioners build better intuition about when complexity helps and when it hurts. If you run a similar experiment on your own data, we'd love to see the results.\n\n---\n\n*Run this experiment yourself in [MathExec](https:\u002F\u002Fmathexec.com\u002Fapp). Upload any CSV, try progressively complex formulas, and compare results in the Experiments tab.*\n","Experiment: we trained 6 formulas of increasing complexity on the same dataset and measured accuracy, training time, and overfitting. Original data.",null,[],[26],"blog\u002F69c95b46b422d9f69ccff185\u002Fc984b173-0f16-4b8d-95ac-4228410e3a16.png"]