如何理解 Xgboost 模型转储

如何解决如何理解 Xgboost 模型转储

注意到 spark xgboost 没有像 Python API 那样的 API <select name="cars" id="cars"> <option value="100">Volvo</option> <option value="200">Saab</option> <option value="300">Mercedes</option> <option value="400">Audi</option> </select> <input type="text" id="name" name="name">，我试图解析 trees_to_dataframe() 结果，但我对其格式感到困惑，哪些字段代表什么等等。

getModelDump

我的模型参数设置如下：

 // train xgb_model in spark version of xgboost
scala> xgb_model
res18: ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel = xgbc_89286dd04aa3

scala> xgb_model.nativeBooster.getModelDump(null,true);
res19: Array[String] =
Array("0:[f1<53] yes=1,no=2,missing=2,gain=58047.7812,cover=336165
        1:[f3<53.9500008] yes=3,no=4,missing=3,gain=24677.3848,cover=63748.25
                3:leaf=-0.0531237721,cover=53626.5
                4:leaf=0.031994272,cover=10121.75
        2:[f16<1.66669905] yes=5,no=6,missing=6,gain=10181.9785,cover=272416.75
                5:leaf=-0.0937986076,cover=268367
                6:leaf=-0.0139159411,cover=4049.75
","0:[f1<51] yes=1,gain=52816.4062,cover=336097.594
        1:[f8<369.570007] yes=3,missing=4,gain=22681.3555,cover=60529.668
                3:leaf=-0.0121749714,cover=37363.5625
                4:leaf=-0.0751453713,cover=23166.1055
        2:[f16<1.67979908] yes=5,gain=10274.8359,cover=275567.906
                5:leaf=-0.089068912,cover=271300.188
                6:leaf=-0.0108754979,cover=4267.74268
","0:[f1<56] yes=1,gain=4887...

scala> res19.size
res20: Int = 200

我认为 xgbParams = {'n_estimators': 200,'max_depth': 2,'eta': 0.05,'lambda':1,'gamma':4,'alpha':0.1,'subsample':0.8,#'min_child_weight': 1,'colsample_bytree':0.8,'objective': 'binary:logistic','colsample_bylevel':0.8,'eval_metric':'logloss','seed': 1122,'missing': -999999999} = 200 是有道理的，因为我已将 res19.size 设置为 200。我对 n_estimators 中的每个字符串感到困惑，它们的格式如下：我认为 res19 必须代表某些特定功能，但我如何找到示例功能名称？另外，f2、0、1 代表什么？ 2 是什么意思？

提前致谢！！

yes=3,no=4