Machine Learning Training Dynamics

Five interactive visualizations exploring optimization, generalization, and model behavior

Decision Boundaries

Neural networks learn to separate classes by finding decision boundaries. Watch the boundary evolve from a poor initial guess to a circular shape that cleanly separates the inner and outer clusters.

The model learns a circular boundary that separates two concentric clusters. Early epochs show a small elliptical boundary. Later epochs reveal the circular shape needed to fit the data. Neural networks with hidden layers can learn circular and other complex decision boundaries by learning nonlinear transformations of the input features.

Loss Curves and Overfitting

Training loss keeps dropping, but validation loss starts climbing back up. This is overfitting. The model memorizes training data instead of learning patterns. Scroll to watch it happen.

Scroll to progress through 100 epochs

Around epoch 40, validation loss stops improving and starts increasing while training loss continues to drop. This gap signals overfitting. The model becomes too specialized to training data quirks and noise, losing its ability to generalize to new examples. Solutions include early stopping (halt training when validation loss increases), regularization, dropout, or gathering more training data. The shaded regions show variance across different training runs.

Optimizer Comparison

Different optimizers take different paths down the same loss surface. Watch SGD and Adam navigate a 2D landscape. Each has unique strengths and trade-offs.

SGD (Stochastic Gradient Descent) takes direct steps opposite to the gradient. Simple and reliable, but can be slow in flat regions and gets stuck in valleys. Adam is an adaptive optimizer that adjusts the learning rate per parameter using both momentum and second-moment estimates. It converges faster and handles sparse gradients well, making it the default choice for many deep learning tasks. There is no universally best optimizer; the choice depends on your problem, data characteristics, and computational constraints.

Bias-Variance Tradeoff

Model complexity determines whether you underfit, overfit, or find the sweet spot. These four scenarios show the classic tradeoff in action with polynomial regression.

Top-left shows just-right model complexity. Fits training data well and generalizes to test data. Top-right is overfit (too complex). Perfect on training data but terrible on test data due to learning noise. Bottom-left is underfit (too simple). Poor fit on both training and test data due to inability to capture the pattern. Bottom-right is worst case (wrong model plus noisy data). Erratic predictions, unstable across datasets.

Feature Space Evolution

Watch how a neural network learns to separate classes in feature space. As training progresses, initially mixed clusters gradually separate into distinct groups.

This shows a 2D projection of the network's internal representations. Each point is a data sample, colored by its true class. Early epochs show random and mixed features. The network has not learned meaningful representations yet. Later epochs show classes separating into distinct clusters as the network learns discriminative features. Good feature representations are the foundation of classification. This visualization shows the network learning to see the differences between classes.

About This Project

This project explores machine learning concepts through interactive visualizations. I selected topics that are inherently visual, including decision boundaries, loss landscapes, optimizer behavior, bias-variance tradeoffs, and feature space evolution, because animation and interactivity make these concepts more intuitive than static explanations.

Built with D3.js v7, the project experiments with natural animations and scroll-driven storytelling, inspired by sites like Distill.pub and r2d3.us. All data is synthetically generated in JavaScript using mathematical functions and procedural generation.

Tools & Further Reading

Libraries Used

  • D3.js v7 for interactive visualizations
  • Scikit-learn for exploring dataset patterns during development

Further Reading

  • Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv:1609.04747
  • Goodfellow, Bengio & Courville (2016). Deep Learning. MIT Press

Inspiration