Train-Validation-Test sets

Train:

a subset of the data used to Fit the model

validation:

a subset of the data used to **decide between models **

Test set :

a subset of the data used to estimate error

Train - Test split:

Most commonly used is the random split:
- Independent events
Time-based split:
- Stock market
Split by id
- Multiple observations per person How to select:
What is the sampling unit
How are we going to use the model

Grid Search vs. Random Search

Cross validation:

Leave-one-out:
k-fold cross-validation:
- Randomize observations
- Divide in k groups(folds) of approximately equal size
- Use one fold for test and the test for training
- repeat k times
- average over all test sets Model selection vs estimating test error

Feature selection during model selection:

scale the training set after the train-test split Feature selection:

Normalizaing data & selecting feature should be done inside the training set

Train and validation in Time Series:

Consider how does the data collected(etc. pattern every year or month. )
How much validation we are going to forecast( half cycle or a full cycle )
1. Examine the pattern of the data to choose train and validation set
2. Train is always the history and the validation is always the future
3. Test set is optional, only have a test set if you’re asked to do so

Step:

Split the data into train + test
use train set to fit all the ARMA(p,q) models with different p and q
each model will generate forecast on len(test)
compare RMSE(test, ) of each model. choose the model with smallest RMSE.

note: in TS, everything need to follow a time order. so in term of the cross validation process, it would be different

K-fold: Expending window cross validation

K(3,5,10)
- thinking about the time take to training, smaller the k take less time
validation size
- it depend on the study, how long we want to forecast. choose the same validation size same as the final forecast window
Rolling size:
- Use all the data, most likely the same as the validation size
Initial training size :
- n-k * validation

try to design size h that rain and validation sets have

Def:

we are interested in generalization error :

expected value of the error on future data
Estimated by computing error pn a large independent **Test ** set

🪴 Freeman's Second Brain

Explorer

Train-Validation-Test sets

Train-Validation-Test sets

Train:

validation:

Test set :

Train - Test split:

Cross validation:

Feature selection during model selection:

note: in TS, everything need to follow a time order. so in term of the cross validation process, it would be different

try to design size h that rain and validation sets have

Def:

TAGS

Graph View

Table of Contents

Backlinks

🪴 Freeman's Second Brain

Explorer

Train-Validation-Test sets

Train-Validation-Test sets §

Train: §

validation: §

Test set : §

Train - Test split: §

Cross validation: §

Feature selection during model selection: §

note: in TS, everything need to follow a time order. so in term of the cross validation process, it would be different §

try to design size h that rain and validation sets have §

Def: §

TAGS §

Graph View

Table of Contents

Backlinks

Train-Validation-Test sets

Train:

validation:

Test set :

Train - Test split:

Cross validation:

Feature selection during model selection:

note: in TS, everything need to follow a time order. so in term of the cross validation process, it would be different

try to design size h that rain and validation sets have

Def:

TAGS