Intentional overfitting? Discover the wild side of ML

Wednesday, February 5, 2025
Overfitting is often seen as taboo in machine learning, as it can sabotage a ML model’s prediction performance with too much training data. But can you imagine if we could use this overfitting to good use in ML?
Useful or intentional overfitting is considered a functional ML model training pattern when the general ML regularization and generalization mechanisms are omitted while training the ML model. Also there are no train and test sample splits while training when we are doing this intentional overfitting.
But where we can apply this intentional overfitting, and why?
In some physical and dynamic systems, we can model behavior using theoretically and practically proven mathematical formulas. These equations may contain Ordinary or Partial Differential Equations (PDEs), so most of them do not have any closed form solutions. So in order to solve them in computers we have to apply classical numerical methods which take a lot of time and resources. Even if we could keep a lookup table derived from exact input and output to get the outputs, if the lookup table is too big it will take an awful lot of time to provide the solution.
So we can use a ML model to approximately model the system here. We can use a lookup table (which contains all the input and output space) which is derived from accurate model as our training data to train the ML model. Then the ML model can generate fast approximated predictions which are closer to the real model.

In order to do this, the following conditions should be satisfied regarding the physical system:
- Whole input spectrum should be covered in training data and it also should be finite. ( ex: 0.5 meters increments of distances from 1km to 500km), where there is no unseen data will be inserted to the model in anytime.
- There should not be overlapping training samples.
- The physical system should not be a chaotic system, where slight changes in initial conditions of the system results drastic changes in the behavior.
So we can intentionally overfit our model for the system here, because there is no unseen data being used for the prediction. The model only needs to follow the patterns in the training data without any condition. In machine learning, overlapping samples lets ML model to do the probabilistic prediction on training data but we can omit it and do a deterministic prediction in here when we don’t have overlapping samples in training data. Usually the ML models do interpolation by weighting the samples with their closest samples in training data, which makes this approach is more suitable for non-chaotic systems. Also physical systems don’t have a lot of noise in the data and always provide deterministic results, which makes intentional overfitting models a good option.
But how can we confidently state that ML models can identify the patterns in complex physical systems? This can be confirmed by the Universal Approximation Theory in Deep learning which states that any complex function can be approximated by an Artificial Neural Network with at least one hidden layer and with non linear (squashing) activation function. This means no matter what function is given, a simple NN can approximate the function and also its derivatives as well.
But is also important to state that Intentional overfitting is applicable in very limited scenarios and it is not a silver bullet solution (Even though the numerical solution is complex, but if it only has a small input output lookup table, then doing a search in the lookup table is the most appropriate solution)
It’s clear that intentional overfitting flips the usual script on ML practices, turning what’s usually seen as a flaw into a strategic advantage for modeling certain physical systems. By purposefully overfitting, we can make ML models mimic complex, deterministic behaviors without the usual concerns for generalization or noise.
But let’s be clear, this approach only works under specific conditions: the input range must be finite and fully covered, the system must be non-chaotic, and there shouldn’t be overlapping samples. With the backing of the Universal Approximation Theory, we know neural networks can capture these precise patterns, making this approach a smart choice when traditional numerical methods or lookup tables get too cumbersome.
So, while intentional overfitting isn’t a one-size-fits-all answer, it’s a clever and efficient alternative in the right scenarios, where speed and approximation matter the most.
References:
Machine Learning Design Patterns (Book): Valliappa Lakshmanan, Sara Robinson & Michael Munn