[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"blog-post-17-textbook-formulas-compiled":3},{"id":4,"title":5,"slug":6,"excerpt":7,"category":8,"tags":9,"author_name":14,"cover_image":15,"status":16,"view_count":17,"reading_time_minutes":18,"published_at":19,"updated_at":19,"created_at":20,"content":21,"meta_description":22,"og_image":15,"canonical_url":23,"author_uid":23,"previous_slugs":24,"images":25},"69c95b45b422d9f69ccff182","We Compiled 17 Textbook Formulas to PyTorch. Here's What Broke.","17-textbook-formulas-compiled","We ran 17 standard ML formulas through MathExec's compiler. 10 compiled cleanly. 7 didn't. The failures taught us more than the successes.","engineering",[10,11,12,13],"benchmark","pytorch","compiler","formulas","Kingsley Michael","https:\u002F\u002Fmathexec.com\u002Fblog\u002Fimages\u002F69c95b45b422d9f69ccff182\u002Ffbec4527-673c-42d9-b3b4-ea23925ad4eb.png","published",13,8,"2026-04-01T15:27:37.410000","2026-03-29T17:02:52.547000","# We Compiled 17 Textbook Formulas to PyTorch. Here's What Broke.\n\nEvery ML textbook teaches you formulas. Linear regression is `y = mx + b`. Logistic regression is `y = σ(Wx + b)`. A two-layer neural network is `y = σ(W₂ · ReLU(W₁x + b₁) + b₂)`. These expressions are clean, compact, and universal.\n\nThen you sit down to implement them and write 50-100 lines of PyTorch.\n\nMathExec's formula compiler is supposed to bridge that gap: paste the LaTeX, get a working PyTorch model. But mathematical notation has ambiguities, conventions, and edge cases that make compilation harder than it looks. We decided to stress-test the compiler by running 17 standard ML formulas through it and documenting exactly what happened.\n\n## The 17 formulas\n\nWe picked formulas that span the range of what you'd encounter in an ML textbook or course:\n\n| # | Formula | Type |\n|---|---------|------|\n| 1 | `y = mx + b` | Simple linear regression |\n| 2 | `y = ax² + bx + c` | Polynomial regression |\n| 3 | `y = Wx + b` | Multivariate linear |\n| 4 | `y = σ(Wx + b)` | Logistic regression |\n| 5 | `y = σ(W₂ · ReLU(W₁x + b₁) + b₂)` | 2-layer MLP (binary) |\n| 6 | `y = W₃ · ReLU(W₂ · ReLU(W₁x + b₁) + b₂) + b₃` | 3-layer MLP (regression) |\n| 7 | `y = σ(W₂ · tanh(W₁x + b₁) + b₂)` | Tanh hidden layer |\n| 8 | `y = softmax(Wx + b)` | Linear softmax classifier |\n| 9 | `y = softmax(W₂ · ReLU(W₁x + b₁) + b₂)` | NN multi-class classifier |\n| 10 | `y = softmax(QKᵀ \u002F √d)V` | Attention mechanism |\n| 11 | `y = f(x) + x` | Residual\u002Fskip connection |\n| 12 | `y = LSTM(x)` | Recurrent (LSTM) |\n| 13 | `y = GRU(x)` | Recurrent (GRU) |\n| 14 | `y = Conv1d(x)` | 1D convolution |\n| 15 | `y = Transformer(x)` | Transformer encoder |\n| 16 | `y = αx + β` | Greek scalar parameters |\n| 17 | `y = BN(ReLU(Wx + b))` | Batch normalization |\n\nWe ran each one through the compiler with `n_features=4` (simulating a 4-column CSV input) and `output_dim=1`.\n\n## Results: 10 compiled, 7 didn't\n\n### What compiled cleanly\n\n**Formulas 1-7** (linear, polynomial, logistic, MLP variants) all compiled successfully. The parser correctly identified the model type, the compiler generated the right PyTorch module, and the models were ready to train.\n\nThe pattern matching here is straightforward: the compiler recognizes `σ(...)` as sigmoid activation, `ReLU(...)` and `tanh(...)` as hidden layer activations, capital letters as weight matrices, and lowercase letters as biases. These conventions cover the vast majority of textbook ML formulas.\n\n**Formulas 12-15** (LSTM, GRU, Conv1d, Transformer) also compiled. These are handled by named-architecture matching: the parser sees `LSTM(x)` and routes to a pre-built `RecurrentModel` class. It's less \"compilation\" and more \"pattern dispatch,\" but the result is the same: you write the formula, you get a working model.\n\n### What broke (and why)\n\n**Formula 8: `y = softmax(Wx + b)`** failed. The parser detected it as a generic formula, not a classification model. Why? The compiler's pattern matching looks for `σ(...)` to identify classification, but `softmax(...)` uses a different code path that wasn't wired into the main detection logic. The fix is straightforward (add softmax to the classifier detection patterns), but the failure reveals how brittle pattern-based compilation can be.\n\n**Formula 9: `y = softmax(W₂ · ReLU(W₁x + b₁) + b₂)`** failed for the same reason. The outer activation determines the model type, and the compiler didn't recognize softmax-wrapped MLPs.\n\n**Formula 10: `y = softmax(QKᵀ \u002F √d)V`** failed entirely. This is a self-attention head, and the LaTeX notation doesn't match any of the compiler's patterns. The formula uses conventions (Q, K, V for query\u002Fkey\u002Fvalue) that are specific to the attention literature, not general mathematical notation. The compiler has an `AttentionModel` class, but it's triggered by the keyword `Transformer` or `attention`, not by the actual mathematical expression.\n\n**Formula 11: `y = f(x) + x`** (skip connection) failed because `f(x)` is an abstract function reference, not a concrete operation. The compiler doesn't know what `f` is. In practice, skip connections appear as part of larger formulas like `y = ReLU(W₂ · ReLU(W₁x + b₁) + b₂ + x)` where the `+ x` at the end creates the residual path. The compiler handles that pattern through its `ResidualModel`, but the abstract notation `f(x) + x` doesn't trigger it.\n\n**Formula 16: `y = αx + β`** failed because the parser doesn't yet map Greek letters to the same roles as Latin letters. `m` and `b` in `y = mx + b` are recognized as slope and intercept, but `α` and `β` aren't given the same treatment. This is a parser limitation: Greek letters should work identically to their Latin equivalents for parameter roles.\n\n**Formula 17: `y = BN(ReLU(Wx + b))`** failed because batch normalization isn't in the compiler's operation dictionary. The model constructors handle linear layers, activations, and output layers, but BN is an intermediate normalization step that doesn't map cleanly to the existing compilation pipeline.\n\n## What the failures teach us\n\nThe 7 failures cluster into three categories:\n\n**Missing pattern coverage (formulas 8, 9):** The compiler knows what softmax is but doesn't recognize it in all positions. This is a completeness gap, not an architectural limitation. These are the easiest to fix.\n\n**Notation ambiguity (formulas 10, 11, 16):** Mathematical notation is context-dependent. `f(x)` means something different in a calculus textbook and an ML paper. `Q`, `K`, `V` have special meaning only in the attention literature. The compiler can't resolve these without domain-specific conventions beyond standard math.\n\n**Unsupported operations (formula 17):** Batch normalization is a computation that doesn't have a clean mathematical symbol. It's written as `BN(x)` in papers, but that's an abbreviation, not a standard mathematical function. Supporting it requires explicitly adding it to the compiler's operation dictionary.\n\n## The line count comparison\n\nFor the 10 formulas that compiled, we checked the equivalent hand-written PyTorch code:\n\n| Formula | LaTeX (chars) | PyTorch (lines) | Ratio |\n|---------|:---:|:---:|:---:|\n| `y = mx + b` | 11 | 42 | 1:4 |\n| `y = σ(Wx + b)` | 14 | 48 | 1:3 |\n| `y = σ(W₂·ReLU(W₁x+b₁)+b₂)` | 28 | 62 | 1:2 |\n| `y = LSTM(x)` | 12 | 71 | 1:6 |\n| `y = Transformer(x)` | 18 | 89 | 1:5 |\n\nThe \"PyTorch lines\" column counts everything needed for a working, trainable script: imports, model class definition, data loading, training loop, and evaluation. The ratio gets *worse* (more lines per character of LaTeX) as the architecture gets more complex, because the boilerplate stays constant while the formula stays compact.\n\n## What's next\n\nWe're working on fixing the three failure categories:\n\n1. **Softmax pattern coverage** is being added to the classifier detection logic\n2. **Greek letter support** needs a parser update to treat α, β, γ the same as a, b, c\n3. **Batch normalization** requires adding BN as a recognized operation\n\nThe attention formula (`softmax(QKᵀ\u002F√d)V`) is a harder problem. It requires the compiler to understand domain-specific conventions, not just mathematical syntax. We're exploring whether a hybrid approach (rule-based parsing plus LLM-assisted interpretation) can handle these cases reliably.\n\nOur goal isn't 17\u002F17. It's covering the formulas that practitioners actually write most often. The first 7 (linear through tanh MLP) account for the vast majority of formulas on MathExec's canvas today.\n\n## Methodology\n\nWe tested each formula using MathExec's formula compiler with consistent parameters:  (simulating a 4-column CSV) and . Each formula was tested both through the parser (pattern detection) and the full compilation pipeline (model generation).\n\nFor the line count comparison, we wrote the complete, runnable PyTorch equivalent for each compiled formula. \"Complete\" means: imports, model class, Dataset class, DataLoader, training loop, evaluation, and a final print statement with the metric. No shortcuts, no library helpers beyond base PyTorch.\n\nWe used a minimal coding style with no unnecessary abstractions. If you write verbose code with helper functions and extensive error handling, your line counts will be higher. If you use Lightning or fastai, they'll be lower. Our numbers represent the baseline PyTorch experience.\n\n## Why 10\u002F17 matters\n\nA 59% success rate doesn't sound great until you consider what the 10 successful formulas cover. Linear regression, polynomial regression, logistic regression, MLPs of any depth with any standard activation, LSTMs, GRUs, convolutions, and transformers. That's the vast majority of what practitioners actually use day-to-day.\n\nThe 7 failures are real, and we're working on fixing them. But they represent edge cases (Greek parameters, batch normalization) or domain-specific notation (attention mechanisms written in explicit math form) rather than core workflow gaps. Most users typing formulas into MathExec will hit the 10 that work, not the 7 that don't.\n\nThat said, we're publishing these results precisely because we think transparency about limitations builds more trust than claiming everything works perfectly.\n\n---\n\n*Test these formulas yourself in [MathExec](https:\u002F\u002Fmathexec.com\u002Fapp). The compiler is updated frequently, so some of the failures above may already be fixed by the time you read this.*\n","We compiled 17 textbook ML formulas to PyTorch using MathExec's formula compiler. Results, failures, and what we learned about parsing mathematical notation.",null,[],[26],"blog\u002F69c95b45b422d9f69ccff182\u002F77ccdab5-2aaa-4004-a033-85b3ae435581.png"]