Data and Model Fit
Dashed = true relationship, Red = model prediction
Understanding the fundamental tradeoff between model simplicity and flexibility in machine learning.
We're exploring a simple relationship: how does the number of hours of sleep before an exam affect the number of mistakes made? The true relationship is U-shaped—both too little and too much sleep lead to more mistakes, with an optimal amount around 7 hours.
Before diving into the tradeoff, let's understand these two fundamental sources of error in machine learning models.
Bias is the error from overly simplistic assumptions in the learning algorithm. A model with high bias pays little attention to the training data and oversimplifies the problem.
Variance is the error from sensitivity to small fluctuations in the training data. A model with high variance pays too much attention to training data, including noise.
Here's the key insight: you can't minimize both at once. As you decrease bias (make the model more complex), variance typically increases. As you decrease variance (make the model simpler), bias typically increases.
The goal is to find the sweet spot where total error (bias² + variance) is minimized. This is what we'll explore interactively below.
Use the controls below to adjust the polynomial degree (model complexity) and noise level. Watch how the model fit and error curves change as you explore different settings.
Dashed = true relationship, Red = model prediction
U-shaped validation curve
Train vs validation
Model is too simple. It cannot capture the underlying pattern. Both training and validation errors are high.
Model complexity matches the data. Validation error is minimized. The gap between errors is small.
Model is too complex. It memorizes noise instead of learning patterns. Validation error increases.