[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"blog-post-how-mathexec-compiles-latex-to-pytorch":3},{"id":4,"title":5,"slug":6,"excerpt":7,"category":8,"tags":9,"author_name":14,"cover_image":15,"status":16,"view_count":17,"reading_time_minutes":18,"published_at":19,"updated_at":19,"created_at":20,"content":21,"meta_description":22,"og_image":15,"canonical_url":23,"author_uid":23,"previous_slugs":24,"images":25},"69ac3871dfa72009eb0767bf","How MathExec Compiles LaTeX to PyTorch","how-mathexec-compiles-latex-to-pytorch","A deep dive into MathExec's formula compiler: how we parse LaTeX expressions and generate equivalent PyTorch modules with trainable parameters.","engineering",[10,11,12,13],"pytorch","latex","compiler","deep-dive","Kingsley Michael","https:\u002F\u002Fmathexec.com\u002Fblog\u002Fimages\u002F69ac3871dfa72009eb0767bf\u002F3f454f4e-0dfd-422a-8020-f43e6311f6e4.png","published",44,7,"2026-02-25T14:38:41.373000","2026-02-25T12:38:41.373000","# How MathExec Compiles LaTeX to PyTorch\n\nOne of the core pieces of MathExec is the **formula compiler**: the system that takes a LaTeX expression like `y = σ(W₂ · ReLU(W₁x + b₁) + b₂)` and produces a working PyTorch `nn.Module` with the right parameters, shapes, and forward pass.\n\nThis post walks through how it works, what design decisions we made along the way, and where we ran into interesting problems.\n\n## The pipeline\n\n```\nLaTeX string → Token stream → AST → PyTorch Module\n```\n\nEach stage handles a specific kind of complexity, and keeping them separate makes the compiler easier to test and extend.\n\n### Step 1: Tokenization\n\nThe compiler first tokenizes the LaTeX into meaningful symbols:\n\n```python\n\"y = \\sigma(W_2 \\cdot ReLU(W_1 x + b_1) + b_2)\"\n# →\n['y', '=', 'sigma', '(', 'W_2', 'cdot', 'ReLU', '(', 'W_1', 'x', '+', 'b_1', ')', '+', 'b_2', ')']\n```\n\nWe handle LaTeX commands (`\\sigma`, `\\cdot`, `\\frac`), subscripts (`W_1`), superscripts (`x^2`), and Greek letters. The tokenizer also normalizes variant representations: `\\sigmoid`, `\\sigma`, and `σ` all become the same token.\n\nOne tricky part is distinguishing between subscripts that are part of a variable name (like `W_1` meaning \"weight matrix 1\") and subscripts that have mathematical meaning (like `x_i` meaning \"the i-th element of x\"). We treat numeric subscripts as name qualifiers and alphabetic subscripts as indexing operations, which works for the vast majority of ML formulas.\n\n### Step 2: AST construction\n\nTokens are parsed into an abstract syntax tree following mathematical operator precedence:\n\n- Function application binds tightest: `σ(...)`, `ReLU(...)`\n- Multiplication (explicit `\\cdot` or implicit juxtaposition): `W₁x`\n- Addition\u002Fsubtraction: `... + b₁`\n- Comparison\u002Fassignment: `y = ...`\n\nImplicit multiplication is one of the more interesting parsing challenges. In mathematical notation, `Wx` means `W` times `x`, but there's no operator between them. The parser inserts an implicit multiplication node whenever two operands appear adjacent without an operator. This also handles cases like `2x` (scalar times variable) and `W₁W₂x` (chained matrix multiplications).\n\nParentheses and function application are handled by recursive descent. When the parser sees a known function name followed by `(`, it consumes everything up to the matching `)` as the function's argument. This naturally handles nested expressions like `σ(ReLU(...))`.\n\n### Step 3: Code generation\n\nThe AST is walked to emit PyTorch code. Each node maps to a PyTorch operation:\n\n| LaTeX | PyTorch | Notes |\n|-------|---------|-------|\n| `Wx` | `nn.Linear(in, out)` | Capital letter = weight matrix |\n| `Wx + b` | `nn.Linear(in, out, bias=True)` | Bias detected and folded in |\n| `σ(...)` | `torch.sigmoid(...)` | Sigmoid activation |\n| `ReLU(...)` | `torch.relu(...)` | ReLU activation |\n| `softmax(...)` | `F.softmax(..., dim=-1)` | Softmax with last-dim default |\n| `tanh(...)` | `torch.tanh(...)` | Hyperbolic tangent |\n| `x²` | `x ** 2` | Element-wise power |\n| `\\frac{a}{b}` | `a \u002F b` | Division |\n| `\\sqrt{x}` | `torch.sqrt(x)` | Square root |\n| `\\exp(x)` | `torch.exp(x)` | Exponential |\n| `\\log(x)` | `torch.log(x)` | Natural logarithm |\n\nWhen the compiler encounters a `Wx + b` pattern (linear transformation plus bias), it folds both into a single `nn.Linear` layer with `bias=True` rather than creating separate weight and bias parameters. This is both more efficient and produces cleaner generated code.\n\n### Step 4: Shape inference\n\nParameter shapes are inferred from the data at training time, not at compile time. When you provide a CSV with 10 input columns, the compiler sets `input_dim=10` on the first layer. Output dimensions are inferred from the target variable.\n\nHidden layer sizes are a harder problem because they're not specified in the formula. `y = σ(W₂ · ReLU(W₁x + b₁) + b₂)` tells us there's a hidden layer, but not how wide it should be. We default to 64 hidden units, which is a reasonable starting point for most tabular datasets. You can override this in the training configuration.\n\nFor deeper networks, each intermediate dimension defaults to the same hidden size. We've found that 64 works well for datasets under 10,000 rows, and users who need larger networks usually know enough to adjust the setting.\n\n## Handling ambiguity\n\nMathematical notation is inherently ambiguous. `Wx + b` could mean matrix multiplication or element-wise multiplication. We use conventions that match how most ML textbooks write formulas:\n\n- **Capital letters** (W, M, A) → weight matrices (`nn.Linear`)\n- **Lowercase letters** (b, c) → bias vectors (`nn.Parameter`)\n- **Greek letters** (α, β, γ) → scalar parameters\n- **x, X** → input data (not trainable)\n- **y** → output\u002Ftarget\n\nThese conventions handle about 95% of ML formulas correctly. For the remaining 5%, users can use explicit annotations in the formula editor to override the compiler's assumptions.\n\nAnother source of ambiguity is operator precedence with implicit multiplication. Does `2Wx` mean `(2W)x` or `2(Wx)`? Since scalar-matrix multiplication is commutative and associative, the result is the same either way. But for expressions like `ReLU Wx + b`, the parser needs to understand that `ReLU` is a function applied to `Wx + b`, not a variable being multiplied. We maintain a dictionary of known function names to resolve this.\n\n## Loss function selection\n\nThe compiler also selects an appropriate loss function based on the output activation:\n\n- `σ(...)` (sigmoid output) → `BCELoss` (binary cross-entropy)\n- `softmax(...)` → `CrossEntropyLoss`\n- No activation or linear output → `MSELoss` (mean squared error)\n\nThis heuristic is right for the most common cases. If you're doing something unusual (like regression with a sigmoid output for bounded predictions), you can override the loss function in the training panel.\n\n## Example: 2-layer MLP\n\nInput:\n```latex\ny = \\sigma(W_2 \\cdot ReLU(W_1 x + b_1) + b_2)\n```\n\nGenerated PyTorch:\n```python\nclass FormulaModel(nn.Module):\n    def __init__(self, input_dim, output_dim):\n        super().__init__()\n        hidden = 64  # default hidden size\n        self.layer1 = nn.Linear(input_dim, hidden)\n        self.layer2 = nn.Linear(hidden, output_dim)\n\n    def forward(self, x):\n        h = torch.relu(self.layer1(x))\n        return torch.sigmoid(self.layer2(h))\n```\n\nThe compiler handles nested expressions of arbitrary depth, so you can write complex architectures as a single formula. A 4-layer network with mixed activations works just as well:\n\n```latex\ny = softmax(W_4 \\cdot ReLU(W_3 \\cdot tanh(W_2 \\cdot ReLU(W_1 x + b_1) + b_2) + b_3) + b_4)\n```\n\n## Error handling and edge cases\n\nNot every formula compiles cleanly. The compiler needs to handle malformed input gracefully.\n\n**Unbalanced parentheses** are the most common error. `y = σ(Wx + b` is missing a closing paren. The compiler detects this during parsing and reports the location of the mismatch.\n\n**Unknown functions** like `y = foo(Wx + b)` produce a warning. If the function name doesn't match any known activation or operation, the compiler treats it as a user-defined function and falls back to a no-op, with a message suggesting alternatives.\n\n**Circular definitions** like `y = y + 1` are caught during AST analysis. The compiler checks that the output variable doesn't appear on the right-hand side in a way that would create a feedback loop (skip connections like `y = f(x) + x` are fine because `x` is the input, not the output).\n\n**Type mismatches** between operations (like adding a scalar to a matrix in a way that doesn't broadcast) are caught at training time when actual tensor shapes are known. The compiler generates code that's structurally valid, but shape errors only surface when data flows through the model.\n\nWe've found that clear error messages matter more than perfect error detection. Users don't mind if the compiler occasionally produces code that fails at training time, as long as the error message tells them what went wrong and how to fix it.\n\n## Performance considerations\n\nThe compilation step itself takes 10-50 milliseconds for typical formulas. It's not a bottleneck. The generated PyTorch code is standard `nn.Module` code with no overhead compared to hand-written models. There's no interpreter or formula evaluation at runtime; the compilation is a one-time code generation step.\n\nFor training, performance depends on the model size and dataset. Tabular models with a few hundred parameters train in seconds. Larger architectures (4+ layers, 256+ hidden units) on datasets with tens of thousands of rows might take a minute or two. This is the same performance you'd get from equivalent hand-written PyTorch, since that's exactly what the compiler generates.\n\nWe benchmarked the generated code against hand-written equivalents on several standard datasets (Iris, Boston Housing, MNIST subsets) and the training time difference is negligible, within 1-2% on average. The compiler doesn't introduce abstractions that slow things down. The output is the same `nn.Module` code you'd write yourself.\n\n## What's next\n\nWe're working on expanding the compiler to handle more architecture types:\n\n- **Attention mechanisms**: `softmax(QK^T \u002F \\sqrt{d})V`\n- **Convolutions**: Spatial operators in formula notation\n- **Recurrent structures**: Formulas with temporal subscripts\n- **Skip connections**: `y = f(x) + x` residual patterns (partially supported already)\n\nThe goal is to cover the architectures you'd find in a typical ML course or research paper, so you can go from reading the paper to training the model in under a minute.\n\n---\n\n*Want to try it? Write a formula in [MathExec](https:\u002F\u002Fmathexec.com\u002Fapp) and see what it compiles to.*\n","Technical deep dive into MathExec's LaTeX-to-PyTorch formula compiler. Learn how mathematical notation becomes trainable neural networks.",null,[],[26],"blog\u002F69ac3871dfa72009eb0767bf\u002F2ae3407f-8bf2-40e3-aaf0-42a2a95cb4db.png"]