Automated Regression Model Selection with Bayesian Optimization

This example uses:

This example shows how to use the fitrauto function to automatically try a selection of regression model types with different hyperparameter values, given training predictor and response data. The function uses Bayesian optimization to select models and their hyperparameter values, and computes the following for each model: $\log (1 + v a l L o s s)$ , where valLoss is the cross-validation mean squared error (MSE). After the optimization is complete, fitrauto returns the model, trained on the entire data set, that is expected to best predict the responses for new data. Check the model performance on test data.

Prepare Data

Load the sample data set NYCHousing2015, which includes 10 variables with information on the sales of properties in New York City in 2015. This example uses some of these variables to analyze the sale prices.

load NYCHousing2015

Instead of loading the sample data set NYCHousing2015, you can download the data from the NYC Open Data website and import the data as follows.

folder = 'Annualized_Rolling_Sales_Update';
ds = spreadsheetDatastore(folder,"TextType","string","NumHeaderLines",4);
ds.Files = ds.Files(contains(ds.Files,"2015"));
ds.SelectedVariableNames = ["BOROUGH","NEIGHBORHOOD","BUILDINGCLASSCATEGORY","RESIDENTIALUNITS", ...
    "COMMERCIALUNITS","LANDSQUAREFEET","GROSSSQUAREFEET","YEARBUILT","SALEPRICE","SALEDATE"];
NYCHousing2015 = readall(ds);

Preprocess the data set to choose the predictor variables of interest. Some of the preprocessing steps match those in the example Train Linear Regression Model.

First, change the variable names to lowercase for readability.

NYCHousing2015.Properties.VariableNames = lower(NYCHousing2015.Properties.VariableNames);

Next, remove samples with certain problematic values. For example, retain only those samples where at least one of the area measurements grosssquarefeet or landsquarefeet is nonzero. Assume that a saleprice of $0 indicates an ownership transfer without a cash consideration, and remove the samples with that saleprice value. Assume that a yearbuilt value of 1500 or less is a typo, and remove the corresponding samples.

NYCHousing2015(NYCHousing2015.grosssquarefeet == 0 & NYCHousing2015.landsquarefeet == 0,:) = [];
NYCHousing2015(NYCHousing2015.saleprice == 0,:) = [];
NYCHousing2015(NYCHousing2015.yearbuilt <= 1500,:) = [];

Convert the saledate variable, specified as a datetime array, into two numeric columns MM (month) and DD (day), and remove the saledate variable. Ignore the year values because all samples are for the year 2015.

[~,NYCHousing2015.MM,NYCHousing2015.DD] = ymd(NYCHousing2015.saledate);
NYCHousing2015.saledate = [];

The numeric values in the borough variable indicate the names of the boroughs. Change the variable to a categorical variable using the names.

NYCHousing2015.borough = categorical(NYCHousing2015.borough,1:5, ...
    ["Manhattan","Bronx","Brooklyn","Queens","Staten Island"]);

The neighborhood variable has 254 categories. Remove this variable for simplicity.

NYCHousing2015.neighborhood = [];

Convert the buildingclasscategory variable to a categorical variable, and explore the variable by using the wordcloud function.

NYCHousing2015.buildingclasscategory = categorical(NYCHousing2015.buildingclasscategory);
wordcloud(NYCHousing2015.buildingclasscategory);

Assume that you are interested only in one-, two-, and three-family dwellings. Find the sample indices for these dwellings and delete the other samples. Then, change the buildingclasscategory variable to an ordinal categorical variable, with integer-valued category names.

idx = ismember(string(NYCHousing2015.buildingclasscategory), ...
    ["01  ONE FAMILY DWELLINGS","02  TWO FAMILY DWELLINGS","03  THREE FAMILY DWELLINGS"]);
NYCHousing2015 = NYCHousing2015(idx,:);
NYCHousing2015.buildingclasscategory = categorical(NYCHousing2015.buildingclasscategory, ...
    ["01  ONE FAMILY DWELLINGS","02  TWO FAMILY DWELLINGS","03  THREE FAMILY DWELLINGS"], ...
    ["1","2","3"],'Ordinal',true);

The buildingclasscategory variable now indicates the number of families in one dwelling.

Explore the response variable saleprice by using the summary function.

s = summary(NYCHousing2015);
s.saleprice

ans = struct with fields:
           Size: [24972 1]
           Type: 'double'
    Description: ''
          Units: ''
     Continuity: []
            Min: 1
         Median: 515000
            Max: 37000000
     NumMissing: 0

Create a histogram of the saleprice variable.

histogram(NYCHousing2015.saleprice)

Because the distribution of saleprice values is right-skewed, with all values greater than 0, log transform the saleprice variable.

NYCHousing2015.saleprice = log(NYCHousing2015.saleprice);

Similarly, transform the grosssquarefeet and landsquarefeet variables. Add a value of 1 before taking the logarithm of each variable, in case the variable is equal to 0.

NYCHousing2015.grosssquarefeet = log(1 + NYCHousing2015.grosssquarefeet);
NYCHousing2015.landsquarefeet = log(1 + NYCHousing2015.landsquarefeet);

Partition Data and Remove Outliers

Partition the data set into a training set and a test set by using cvpartition. Use approximately 80% of the observations for the model selection and hyperparameter tuning process, and the other 20% to test the performance of the final model returned by fitrauto.

rng('default') % For reproducibility of the partition
c = cvpartition(length(NYCHousing2015.saleprice),'Holdout',0.2);
trainData = NYCHousing2015(training(c),:);
testData = NYCHousing2015(test(c),:);

Identify and remove the outliers of saleprice, grosssquarefeet, and landsquarefeet from the training data by using the isoutlier function.

[priceIdx,priceL,priceU] = isoutlier(trainData.saleprice);
trainData(priceIdx,:) = [];

[grossIdx,grossL,grossU] = isoutlier(trainData.grosssquarefeet);
trainData(grossIdx,:) = [];

[landIdx,landL,landU] = isoutlier(trainData.landsquarefeet);
trainData(landIdx,:) = [];

Remove the outliers of saleprice, grosssquarefeet, and landsquarefeet from the test data by using the same lower and upper thresholds computed on the training data.

testData(testData.saleprice < priceL | testData.saleprice > priceU,:) = [];
testData(testData.grosssquarefeet < grossL | testData.grosssquarefeet > grossU,:) = [];
testData(testData.landsquarefeet < landL | testData.landsquarefeet > landU,:) = [];

Use Automated Model Selection

Find an appropriate regression model for the data in trainData by using fitrauto. Try tree and ensemble learners and run the Bayesian optimization in parallel, which requires Parallel Computing Toolbox™. Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results. To reduce computational time, use 3-fold cross-validation, rather than 5-fold cross-validation, as part of the optimization process.

Because of the complexity of the optimization, this process can take some time, especially for larger data sets. By default, fitrauto provides a plot of the optimization and an iterative display of the optimization results. For more information on how to interpret these results, see Verbose Display.

options = struct('UseParallel',true,'Kfold',3);
[mdl,results] = fitrauto(trainData,'saleprice', ...
    'Learners',{'tree','ensemble'},'HyperparameterOptimizationOptions',options);

Copying objective function to workers...
Done copying objective function to workers.

Learner types to explore: ensemble, tree
Total iterations (MaxObjectiveEvaluations): 60
Total time (MaxTime): Inf

|===================================================================================================================================================|
| Iter | Active  | Eval   | log(1 + valLoss) | Time for training | Observed min     | Estimated min    | Learner      | Hyperparameter:       Value |
|      | workers | result |                  | & validation (sec)| log(1 + valLoss) | log(1 + valLoss) |              |                             |
|===================================================================================================================================================|
|    1 |       5 | Accept |          0.25922 |          0.067333 |          0.18985 |          0.19575 |         tree | MinLeafSize:           8676 |
|    2 |       5 | Best   |          0.18985 |           0.14568 |          0.18985 |          0.19575 |         tree | MinLeafSize:            245 |

|    3 |       2 | Accept |          0.25126 |           0.86908 |           0.1849 |          0.18985 |         tree | MinLeafSize:              4 |
|    4 |       2 | Best   |           0.1849 |            1.0049 |           0.1849 |          0.18985 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       18 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:            110 |
|    5 |       2 | Accept |          0.25922 |            0.5705 |           0.1849 |          0.18985 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       24 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:           8426 |
|    6 |       2 | Accept |          0.25126 |           0.87986 |           0.1849 |          0.18985 |         tree | MinLeafSize:              4 |

|    7 |       6 | Accept |          0.21227 |          0.069611 |           0.1849 |          0.18985 |         tree | MinLeafSize:           1722 |

|    8 |       4 | Accept |          0.18763 |           0.15728 |           0.1849 |          0.18538 |         tree | MinLeafSize:             60 |
|    9 |       4 | Accept |           2.9803 |           0.47009 |           0.1849 |          0.18538 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       20 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:         0.054589 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:           8499 |
|   10 |       4 | Accept |           0.1914 |           0.23832 |           0.1849 |          0.18538 |         tree | MinLeafSize:             30 |

|===================================================================================================================================================|
| Iter | Active  | Eval   | log(1 + valLoss) | Time for training | Observed min     | Estimated min    | Learner      | Hyperparameter:       Value |
|      | workers | result |                  | & validation (sec)| log(1 + valLoss) | log(1 + valLoss) |              |                             |
|===================================================================================================================================================|
|   11 |       4 | Accept |          0.23248 |          0.057309 |           0.1849 |          0.18494 |         tree | MinLeafSize:           3991 |

|   12 |       4 | Accept |          0.22354 |            2.9816 |           0.1849 |          0.18493 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      121 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:           3071 |

|   13 |       4 | Accept |           4.7611 |            3.9969 |           0.1849 |          0.18494 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       79 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:        0.0025753 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:             67 |

|   14 |       3 | Best   |          0.17937 |            1.6426 |          0.17937 |          0.17945 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       24 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:             16 |
|   15 |       3 | Accept |          0.25923 |           0.31222 |          0.17937 |          0.17945 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       14 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:           7287 |

|   16 |       5 | Best   |          0.17799 |            4.7324 |          0.17799 |          0.17945 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       69 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.19523 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:            209 |
|   17 |       5 | Accept |          0.19076 |           0.10145 |          0.17799 |          0.17945 |         tree | MinLeafSize:            274 |

|   18 |       4 | Accept |          0.22517 |          0.079648 |          0.17799 |          0.17945 |         tree | MinLeafSize:           2306 |
|   19 |       4 | Accept |          0.21507 |           0.32105 |          0.17799 |          0.17945 |         tree | MinLeafSize:             10 |

|   20 |       4 | Accept |          0.18797 |           0.11404 |          0.17799 |          0.17945 |         tree | MinLeafSize:            155 |

|===================================================================================================================================================|
| Iter | Active  | Eval   | log(1 + valLoss) | Time for training | Observed min     | Estimated min    | Learner      | Hyperparameter:       Value |
|      | workers | result |                  | & validation (sec)| log(1 + valLoss) | log(1 + valLoss) |              |                             |
|===================================================================================================================================================|
|   21 |       3 | Accept |          0.17862 |            5.6403 |          0.17799 |          0.17945 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       54 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              3 |
|   22 |       3 | Accept |          0.19413 |           0.16212 |          0.17799 |          0.17945 |         tree | MinLeafSize:             25 |

|   23 |       6 | Accept |          0.24396 |           0.74378 |          0.17799 |          0.17945 |         tree | MinLeafSize:              5 |

|   24 |       5 | Accept |          0.18986 |           0.16919 |          0.17799 |          0.17945 |         tree | MinLeafSize:             39 |
|   25 |       5 | Accept |          0.19608 |           0.19077 |          0.17799 |          0.17945 |         tree | MinLeafSize:             23 |

|   26 |       4 | Accept |          0.17828 |            11.436 |          0.17799 |          0.17803 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      108 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              3 |
|   27 |       4 | Accept |           0.1809 |            3.7272 |          0.17799 |          0.17803 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       69 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:             55 |

|   28 |       4 | Accept |          0.18171 |            1.9361 |          0.17799 |          0.17803 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       19 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              4 |

|   29 |       4 | Accept |          0.17959 |            8.6553 |          0.17799 |          0.17803 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       75 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              1 |

|   30 |       3 | Accept |          0.20204 |            15.893 |          0.17762 |          0.17676 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      496 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:            910 |
|===================================================================================================================================================|
| Iter | Active  | Eval   | log(1 + valLoss) | Time for training | Observed min     | Estimated min    | Learner      | Hyperparameter:       Value |
|      | workers | result |                  | & validation (sec)| log(1 + valLoss) | log(1 + valLoss) |              |                             |
|===================================================================================================================================================|
|   31 |       3 | Best   |          0.17762 |            6.6202 |          0.17762 |          0.17676 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       95 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:             13 |

|   32 |       5 | Accept |          0.19444 |             5.563 |          0.17762 |          0.17765 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      103 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:           0.9936 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:             98 |
|   33 |       5 | Accept |          0.18056 |           0.75592 |          0.17762 |          0.17765 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       11 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.49541 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:            222 |

|   34 |       4 | Accept |          0.18768 |           0.78702 |          0.17762 |          0.17765 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       13 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.98545 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              1 |
|   35 |       4 | Accept |          0.26635 |            1.1019 |          0.17762 |          0.17765 |         tree | MinLeafSize:              2 |

|   36 |       4 | Accept |            0.206 |            4.2801 |          0.17762 |          0.17783 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      142 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:           1241 |

|   37 |       3 | Accept |          0.21503 |            11.309 |          0.17762 |          0.17764 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      230 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:         0.017904 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              1 |
|   38 |       3 | Accept |           0.3789 |           0.55978 |          0.17762 |          0.17764 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       10 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.27761 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              5 |

|   39 |       6 | Accept |          0.23053 |           0.51383 |          0.17762 |          0.17764 |         tree | MinLeafSize:              7 |

|   40 |       6 | Accept |          0.17996 |            2.7932 |          0.17762 |          0.17764 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       49 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.11514 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              1 |

|===================================================================================================================================================|
| Iter | Active  | Eval   | log(1 + valLoss) | Time for training | Observed min     | Estimated min    | Learner      | Hyperparameter:       Value |
|      | workers | result |                  | & validation (sec)| log(1 + valLoss) | log(1 + valLoss) |              |                             |
|===================================================================================================================================================|
|   41 |       6 | Accept |          0.23707 |            12.289 |          0.17762 |          0.17767 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      480 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:           3779 |

|   42 |       5 | Accept |          0.18527 |            26.878 |          0.17762 |          0.17766 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      491 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.65315 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:           1044 |
|   43 |       5 | Accept |          0.18276 |             4.678 |          0.17762 |          0.17766 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       66 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.81673 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:            635 |

|   44 |       5 | Accept |          0.25057 |            22.826 |          0.17762 |          0.17765 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      412 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.92469 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              1 |

|   45 |       5 | Accept |             4.61 |           0.60326 |          0.17762 |          0.17779 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       10 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:         0.027624 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              1 |

|   46 |       4 | Accept |           0.2001 |             25.14 |          0.17762 |          0.17762 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      465 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:        0.0094768 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:             23 |
|   47 |       4 | Accept |          0.20319 |            3.6011 |          0.17762 |          0.17762 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       70 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.98445 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              4 |

|   48 |       4 | Accept |          0.18106 |            26.642 |          0.17762 |          0.17762 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      495 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:         0.091101 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              1 |

|   49 |       4 | Accept |          0.25922 |            7.6886 |          0.17762 |          0.17762 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      493 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:         0.024602 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:           8658 |

|   50 |       4 | Accept |          0.18162 |            4.8133 |          0.17762 |          0.17762 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       94 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.30945 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              1 |

|===================================================================================================================================================|
| Iter | Active  | Eval   | log(1 + valLoss) | Time for training | Observed min     | Estimated min    | Learner      | Hyperparameter:       Value |
|      | workers | result |                  | & validation (sec)| log(1 + valLoss) | log(1 + valLoss) |              |                             |
|===================================================================================================================================================|
|   51 |       5 | Accept |          0.17788 |              5.18 |          0.17762 |          0.17762 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      102 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.11411 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              4 |

|   52 |       5 | Accept |           0.1858 |            36.145 |          0.17762 |          0.17762 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      499 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.34801 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:            181 |
|   53 |       5 | Accept |          0.17946 |            1.3924 |          0.17762 |          0.17762 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       24 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.47121 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:             29 |

|   54 |       6 | Accept |          0.17837 |            8.0619 |          0.17762 |          0.17761 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      142 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:         0.076159 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              1 |

|   55 |       6 | Accept |          0.17766 |            25.949 |          0.17762 |          0.17761 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      495 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:         0.024193 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              7 |

|   56 |       6 | Accept |           1.2951 |            25.877 |          0.17762 |          0.17762 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      475 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:        0.0044769 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              5 |

|   57 |       6 | Best   |          0.17753 |            28.063 |          0.17753 |          0.17751 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      486 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:         0.038349 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:             39 |

|   58 |       6 | Accept |          0.17978 |            2.0228 |          0.17753 |          0.17751 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       34 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:           0.3482 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:              3 |

|   59 |       6 | Accept |           2.6442 |           0.64641 |          0.17753 |          0.17751 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       10 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:          0.12206 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:             65 |

|   60 |       5 | Accept |          0.97054 |             16.16 |          0.17753 |          0.17751 |     ensemble | Method:             LSBoost |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:      498 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:        0.0048163 |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:           3269 |
|===================================================================================================================================================|
| Iter | Active  | Eval   | log(1 + valLoss) | Time for training | Observed min     | Estimated min    | Learner      | Hyperparameter:       Value |
|      | workers | result |                  | & validation (sec)| log(1 + valLoss) | log(1 + valLoss) |              |                             |
|===================================================================================================================================================|
|   61 |       5 | Accept |           0.2084 |            0.6174 |          0.17753 |          0.17751 |     ensemble | Method:                 Bag |
|      |         |        |                  |                   |                  |                  |              | NumLearningCycles:       15 |
|      |         |        |                  |                   |                  |                  |              | LearnRate:              NaN |
|      |         |        |                  |                   |                  |                  |              | MinLeafSize:            997 |

__________________________________________________________
Optimization completed.
Total iterations: 61
Total elapsed time: 150.2458 seconds
Total time for training and validation: 386.9239 seconds

Best observed learner is an ensemble model with:
	Method:             LSBoost
	NumLearningCycles:      486
	LearnRate:         0.038349
	MinLeafSize:             39
Observed log(1 + valLoss): 0.17753
Time for training and validation: 28.0634 seconds

Best estimated learner (returned model) is an ensemble model with:
	Method:             LSBoost
	NumLearningCycles:      486
	LearnRate:         0.038349
	MinLeafSize:             39
Estimated log(1 + valLoss): 0.17751
Estimated time for training and validation: 28.7325 seconds

Documentation for fitrauto display

The final model returned by fitrauto corresponds to the best estimated learner. Before returning the model, the function retrains it using the entire training data (trainData), the listed Learner (or model) type, and the displayed hyperparameter values.

Evaluate Test Set Performance

Evaluate the performance of the returned model mdl on the test set testData. Compute the test set mean squared error (MSE), and take a log transform of the MSE to match the values in the verbose display of fitrauto. Smaller MSE (and log-transformed MSE) values indicate better performance.

testMSE = loss(mdl,testData,'saleprice');
testError = log(1 + testMSE)

testError = 0.1791

Compare the predicted test set response values to the true response values. Plot the predicted sale price along the vertical axis and the true sale price along the horizontal axis. Points on the reference line indicate correct predictions. A good model produces predictions that are scattered near the line.

testPredictions = predict(mdl,testData);

plot(testData.saleprice,testPredictions,'.')
hold on
plot(testData.saleprice,testData.saleprice) % Reference line
hold off
xlabel(["True Sale Price","(log transformed)"])
ylabel(["Predicted Sale Price","(log transformed)"])

Use box plots to compare the distribution of predicted and true sale prices by borough. Create the box plots by using the boxchart function. Each box plot displays the median, the lower and upper quartiles, any outliers (computed using the interquartile range), and the minimum and maximum values that are not outliers. In particular, the line inside each box is the sample median, and the circular markers indicate outliers.

For each borough, compare the red box plot (showing the distribution of predicted prices) to the blue box plot (showing the distribution of true prices). Similar distributions for the predicted and true sale prices indicate good predictions.

boxchart(testData.borough,testData.saleprice)
hold on
boxchart(testData.borough,testPredictions)
hold off
legend(["True Sale Prices","Predicted Sale Prices"])
xlabel("Borough")
ylabel(["Sale Price","(log transformed)"])

For all the boroughs, the predicted median sale price closely matches the median true sale price. The predicted sale prices seem to vary less than the true sale prices.

Display box charts that compare the distribution of predicted and true sale prices by the number of families in a dwelling.

boxchart(testData.buildingclasscategory,testData.saleprice)
hold on
boxchart(testData.buildingclasscategory,testPredictions)
hold off
legend(["True Sale Prices","Predicted Sale Prices"])
xlabel("Number of Families in Dwelling")
ylabel(["Sale Price","(log transformed)"])

For all dwellings, the predicted median sale price closely matches the median true sale price. The predicted sale prices seem to vary less than the true sale prices.

Plot a histogram of the test set residuals, and check that they are normally distributed. (Recall that the sale prices are log-transformed.)

testResiduals = testData.saleprice - testPredictions;
histogram(testResiduals)
title('Test Set Residuals')

Although the histogram is slightly left-skewed, it is approximately symmetric about 0.

Documentation

Automated Regression Model Selection with Bayesian Optimization

Prepare Data

Partition Data and Remove Outliers

Use Automated Model Selection

Evaluate Test Set Performance

See Also

Related Topics

Statistics and Machine Learning Toolbox Documentation

Support