From: Ecological niche modeling of rabies in the changing Arctic of Alaska

Metric | Setting | Effect | Justification |
---|---|---|---|

Learnrate | AUTO | A detailed but slow model run | Known to provide best results for the algorithm ‘learning’ data |

Subsample fraction | 50% | Internal testing while model is grown | Standard approach for balanced tree models |

Logistic residual trim fraction | 0.10 | Fine-tuning | Allows for better fits |

Huber-M fraction of error squared | 0.90 | Accuracy level | A statistical standard threshold for certainty |

Optimal logistic model selection | Cross entropy | How to find the optimal model | Usually the best setting for tree-based models |

Number of trees to build | 1000 | Number of trees tried out for the best solution | This number should widely overshot the known optimum |

Maximum number of nodes | 6 | Determines the node depth of trees used | This number determines whether a ‘stump’ or a fully fit tree is run |

Terminal node minimum training cases | 10 | For most data cases it provides a robust tree | Number of cases for each tree branch split |

Maximum number of most-optimal models to save summary results | 1 | Just 1 most-optimal model is saved | |

Regression loss criterion | Huber-M (Blend LS and LAD) | A statistical metric to express gain vs cost of a new rule | Standard approach in trees |