Skip to content
FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS
Close
Beyond Knowledge Innovation

Beyond Knowledge Innovation

Where Data Unveils Possibilities

  • Home
  • AI & ML Insights
  • Machine Learning
    • Supervised Learning
      • Introduction
      • Regression
      • Classification
    • Unsupervised Learning
      • Introduction
      • Clustering
      • Association
      • Dimensionality Reduction
    • Reinforcement Learning
    • Generative AI
  • Knowledge Base
    • Introduction To Python
    • Introduction To Data
    • Introduction to EDA
  • References
HomeKnowledge BaseTrain-and-test isn’t the only approach
Knowledge Base

Train-and-test isn’t the only approach

January 16, 2024January 16, 2024CEO 184 views

It’s worth keeping in mind that train-and-test is common, but not the only widely used approach in machine learning. Two of the more coming alternatives are the hold-out approach and statistical approach methods.

  • Hold-out approach: The hold-out approach is like train-and-test, but instead of splitting a dataset into two, it’s split into three: training, test (also known as validation) and hold-out. The training and test datasets are as we’ve described before. The hold-out dataset is a kind of test set that is used only once, when we are ready to deploy our model for real world use.
  • Statistical approach: Simpler models that have originated in statistics often don’t need test datasets. Instead, we can calculate to what degree the model is overfit directly as statistical significance: a p-value.

These statistical methods are powerful, well established, and form the foundation of modern science. The advantage is that the training set doesn’t ever need to be split, and we get a much more precise understanding of how confident we can be about a model. For example, a p-value of 0.01 means there’s a very small chance that our model has found a relationship that doesn’t actually exist in the real world. By contrast, a p-value of 0.5 means that while our model might look good with our training data, it will be no better than flipping a coin in the real world.

The downside to these approaches is that they’re only easily applied to certain model types, such as the linear regression models. For all but the simplest models, these calculations can be extremely complex to perform properly.

We should try and evaluate different train/test splits when building machine learning models, and that generally splits that favor the train set with more data will yield better results.

Scikit-Learn is a free machine learning library for Python. It supports both supervised and unsupervised machine learning, providing diverse algorithms for classification, regression, clustering, and dimensionality reduction.

#Python library
from sklearn.model_selection import train_test_split

#create train and test dataset
train, test = train_test_split(data, test_size=0.3, random_state=2)
hold out, test, train

Post navigation

Previous Post
Previous post: What is Overfitting?
Next Post
Next post: Improve model with hyperparameters
  • Recent
  • Popular
  • Random
  • No image
    7 months ago Low-Rank Factorization
  • No image
    7 months ago Perturbation Test for a Regression Model
  • No image
    7 months ago Calibration Curve for Classification Models
  • No image
    March 15, 20240Single linkage hierarchical clustering
  • No image
    April 17, 20240XGBoost (eXtreme Gradient Boosting)
  • No image
    April 17, 20240Gradient Boosting
  • No image
    January 16, 2024Feature Engineering: Scaling, Normalization, and Standardization
  • No image
    April 7, 2024BaggingClassifier from Scikit-Learn
  • No image
    March 10, 2024NumPy function argmax
  • Implementation (55)
    • EDA (4)
    • Neural Networks (10)
    • Supervised Learning (26)
      • Classification (17)
      • Linear Regression (8)
    • Unsupervised Learning (11)
      • Clustering (8)
      • Dimensionality Reduction (3)
  • Knowledge Base (44)
    • Python (27)
    • Statistics (6)
May 2025
M T W T F S S
 1234
567891011
12131415161718
19202122232425
262728293031  
« Oct    

We are on

FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS

Subscribe

© 2025 Beyond Knowledge Innovation
FacebookTwitterLinkedinYouTubeGitHubSubscribeEmailRSS