From: Ecological niche modeling of rabies in the changing Arctic of Alaska
Metric | Setting | Effect | Justification |
---|---|---|---|
Learnrate | AUTO | A detailed but slow model run | Known to provide best results for the algorithm ‘learning’ data |
Subsample fraction | 50% | Internal testing while model is grown | Standard approach for balanced tree models |
Logistic residual trim fraction | 0.10 | Fine-tuning | Allows for better fits |
Huber-M fraction of error squared | 0.90 | Accuracy level | A statistical standard threshold for certainty |
Optimal logistic model selection | Cross entropy | How to find the optimal model | Usually the best setting for tree-based models |
Number of trees to build | 1000 | Number of trees tried out for the best solution | This number should widely overshot the known optimum |
Maximum number of nodes | 6 | Determines the node depth of trees used | This number determines whether a ‘stump’ or a fully fit tree is run |
Terminal node minimum training cases | 10 | For most data cases it provides a robust tree | Number of cases for each tree branch split |
Maximum number of most-optimal models to save summary results | 1 | Just 1 most-optimal model is saved | Â |
Regression loss criterion | Huber-M (Blend LS and LAD) | A statistical metric to express gain vs cost of a new rule | Standard approach in trees |