Let’s try some hyper-parameter tuning for our polynomial fitting before we give up! Hyper-parameter tuning ====================== Setup ----- .. code:: ipython3 # First, import the libraries we will use import numpy as np import tesuract import matplotlib.pyplot as plt .. code:: ipython3 # import a data set for our regression problem from sklearn.datasets import make_friedman1 X,y = make_friedman1(n_samples=100,n_features=5) .. code:: ipython3 # rescale the input X = 2*X - 1 # center and scale the output as well for good measure (not required) y = (y - y.mean())/np.sqrt(np.var(y)) Grid search tuning ------------------ There are essentially three parameters to tune in the polynomial regression class. The first, and the most obvious, is the polynomial order, which has the keyword ``order`` in the constructor. The next is the type of polynomial interaction terms called ``mindex_type``. “total_orer” os the default, but an alternative is “hyperbolic” which has even fewer interaction terms which emphasizes more higher-order terms. In practice, this rarely leads to a better polynomial, but we can try it anyway. Last, but not least, there is the polynomial ``fit_type``, which determines the solver used to solve the least squares problem (Note even though polynomials are non-linear, the fitting boils down to a linear problem). This can be a bunch of different algorithms, but the three most widely used are ‘linear’, ‘LassoCV’, and ‘ElasticNetCV’. With these parameters in mind, we create a parameter grid just like one would when using the GridSearchCV method in sklearn. .. code:: ipython3 pce_grid = { 'order': list(range(1,12)), 'mindex_type': ['total_order','hyperbolic'], 'fit_type': ['linear','ElasticNetCV','LassoCV'], } Now we use the regression wrapper CV class which wraps the PCEReg class in sklearn’s grid search CV functionality. .. code:: ipython3 # hyper-parameter tune the PCE regression class using all available cores pce = tesuract.RegressionWrapperCV( regressor='pce', reg_params=pce_grid, n_jobs=-1, scorer='r2') pce.fit(X,y) print("Hyper-parameter CV PCE score is {0:.3f}".format(pce.best_score_)) .. parsed-literal:: Fitting 5 folds for each of 66 candidates, totalling 330 fits Fitting 5 folds for each of 66 candidates, totalling 330 fits Hyper-parameter CV PCE score is 0.999 Why so many fits? For each k-fold (5 total) we have to compute 66 fits corresponding to 66 different parameter combinations. This repeats five times to get an average cross validation score. Look at that! We got all the way to an R2 score of basically 1! How did we do that? One of our parameter combinations must have been really good. Which one was it? We can easily find out by called the best_params\_ attribute. .. code:: ipython3 pce.best_params_ .. parsed-literal:: {'fit_type': 'ElasticNetCV', 'mindex_type': 'total_order', 'order': 7} So it seems like 8th order way too high and probably overfit, so a fourth order was much better. Elastic net regularization also seemed to work the best, which uses a mix of l1 and l2 regularization. We can also extract the best scores, and the best estimator, i.e a PCEReg object with the fitted coefficients. .. code:: ipython3 pce_best = pce.best_estimator_ Now, to be fair, we probably should hyper-parameter tune the MLP regressor to perform a completely fair comparison, and it may probably give us ultimately a better model. In general however, neural networks are much hard to hyper-parameter tune and take longer to train, so the polynomial model can be preferred when accuracy and simplicity is required.