- LightGBM 教程
- LightGBM - 首页
- LightGBM - 概述
- LightGBM - 架构
- LightGBM - 安装
- LightGBM - 核心参数
- LightGBM - Boosting算法
- LightGBM - 树生长策略
- LightGBM - 数据集结构
- LightGBM - 二元分类
- LightGBM - 回归
- LightGBM - 排序
- LightGBM - Python实现
- LightGBM - 参数调优
- LightGBM - 绘图功能
- LightGBM - 早停训练
- LightGBM - 特征交互约束
- LightGBM 与其他Boosting算法对比
- LightGBM 有用资源
- LightGBM - 有用资源
- LightGBM - 讨论
LightGBM - 特征交互约束
当LightGBM完成对数据集上集成树的训练后,每个节点都表示由特征值确定的条件。当使用单个树进行预测时,我们从根节点开始,并将节点中给定的特征条件与我们的样本特征值进行比较。我们根据样本中的特征值和树的条件做出决策。这允许我们通过采取特定路径到达树的叶节点来生成最终预测。默认情况下,对哪些节点可以具有哪些功能没有限制。
这种通过迭代树的节点并分析特征条件来生成最终决策的方法被称为特征交互,因为预测器在评估前一个节点的状态后到达特定节点。LightGBM允许我们决定哪些特征可以相互交互。我们可以定义一组索引,只有这些特征才能相互交互。这些特征将无法与其他特征交互,并且在训练阶段生成树时将强制执行此限制。
我们已经展示了如何在LightGBM中对估计器强制执行特征交互约束。LightGBM估计器有一个名为interaction_constraints的参数,它接受一个列表的列表,每个列表都包含可以相互交互的参数的索引。
示例1
这是一个关于如何在LightGBM估计器中强制执行特征交互约束的示例。
来自sklearn.datasets的load_boston函数在某些版本的scikit-learn中可能已弃用。如果出现任何错误,您可以从外部资源加载数据集或使用替代数据集。
# Import necessary libraries import lightgbm as lgb from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.metrics import r2_score # Load the Boston housing dataset boston = load_boston() # Split the data into training and testing sets X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42) # Print the size of the training and testing sets print("Sizes of Train or Test Datasets : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n") # Create LightGBM datasets train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist()) test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist()) # Train the LightGBM model booster = lgb.train({ "objective": "regression", "verbosity": -1, "metric": "rmse", 'interaction_constraints': [[0,1,2,11,12], [3,4], [6,10], [5,9], [7,8]] }, train_set=train_dataset, valid_sets=(test_dataset,), num_boost_round=10 ) # Make predictions test_preds = booster.predict(X_test) train_preds = booster.predict(X_train) # Calculate and print R2 scores print("\nR2 Test Score : %.2f" % r2_score(Y_test, test_preds)) print("R2 Train Score : %.2f" % r2_score(Y_train, train_preds))
输出
这将生成以下结果
Sizes of Train or Test Datasets : (455, 13) (51, 13) (455,) (51,) [1] valid_0's rmse: 7.50225 [2] valid_0's rmse: 7.01989 [3] valid_0's rmse: 6.58246 [4] valid_0's rmse: 6.18581 [5] valid_0's rmse: 5.83873 [6] valid_0's rmse: 5.47166 [7] valid_0's rmse: 5.19667 [8] valid_0's rmse: 4.96259 [9] valid_0's rmse: 4.69168 [10] valid_0's rmse: 4.51653 R2 Test Score : 0.67 R2 Train Score : 0.69
示例2
以下代码训练一个LightGBM模型,使用波士顿数据集预测房价。训练后,它将使用R2得分计算模型在训练数据和测试数据上的效果。
# Import necessary libraries import lightgbm as lgb from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.metrics import r2_score # Load the Boston housing dataset boston = load_boston() # Split the dataset into training and testing sets X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=42) # Print the size of the training and testing sets print("Sizes of Training and Testing Datasets : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape) # Create a LightGBM model with interaction constraints and 10 estimators booster = lgb.LGBMModel(objective="regression", n_estimators=10, interaction_constraints=[[0,1,2,11,12], [3, 4], [6,10], [5,9], [7,8]]) # Train the model on the training set and validate it on the test set booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test)], eval_metric="rmse") # Make predictions on both the test and training sets test_preds = booster.predict(X_test) train_preds = booster.predict(X_train) # Calculate and print the R2 score for the test and training sets print("\nR2 Test Score : %.2f" % r2_score(Y_test, test_preds)) print("R2 Train Score : %.2f" % r2_score(Y_train, train_preds))
输出
这将产生以下结果
Sizes of Training and Testing Datasets : (379, 13) (127, 13) (379,) (127,) [1] valid_0's rmse: 8.97871 valid_0's l2: 80.6173 [2] valid_0's rmse: 8.35545 valid_0's l2: 69.8135 [3] valid_0's rmse: 7.93432 valid_0's l2: 62.9535 [4] valid_0's rmse: 7.61104 valid_0's l2: 57.9279 [5] valid_0's rmse: 7.16832 valid_0's l2: 51.3849 [6] valid_0's rmse: 6.93182 valid_0's l2: 48.0501 [7] valid_0's rmse: 6.57728 valid_0's l2: 43.2606 [8] valid_0's rmse: 6.41497 valid_0's l2: 41.1518 [9] valid_0's rmse: 6.13983 valid_0's l2: 37.6976 [10] valid_0's rmse: 5.9864 valid_0's l2: 35.837 R2 Test Score : 0.60 R2 Train Score : 0.69
广告