LightGBM - 特征交互约束



当LightGBM完成对数据集上集成树的训练后,每个节点都表示由特征值确定的条件。当使用单个树进行预测时,我们从根节点开始,并将节点中给定的特征条件与我们的样本特征值进行比较。我们根据样本中的特征值和树的条件做出决策。这允许我们通过采取特定路径到达树的叶节点来生成最终预测。默认情况下,对哪些节点可以具有哪些功能没有限制。

这种通过迭代树的节点并分析特征条件来生成最终决策的方法被称为特征交互,因为预测器在评估前一个节点的状态后到达特定节点。LightGBM允许我们决定哪些特征可以相互交互。我们可以定义一组索引,只有这些特征才能相互交互。这些特征将无法与其他特征交互,并且在训练阶段生成树时将强制执行此限制。

我们已经展示了如何在LightGBM中对估计器强制执行特征交互约束。LightGBM估计器有一个名为interaction_constraints的参数,它接受一个列表的列表,每个列表都包含可以相互交互的参数的索引。

示例1

这是一个关于如何在LightGBM估计器中强制执行特征交互约束的示例。

来自sklearn.datasets的load_boston函数在某些版本的scikit-learn中可能已弃用。如果出现任何错误,您可以从外部资源加载数据集或使用替代数据集。

# Import necessary libraries
import lightgbm as lgb
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Load the Boston housing dataset
boston = load_boston()

# Split the data into training and testing sets
X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

# Print the size of the training and testing sets
print("Sizes of Train or Test Datasets : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

# Create LightGBM datasets
train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())
test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())

# Train the LightGBM model
booster = lgb.train({
    "objective": "regression",
    "verbosity": -1,
    "metric": "rmse",
    'interaction_constraints': [[0,1,2,11,12], [3,4], [6,10], [5,9], [7,8]]
    },
    train_set=train_dataset,
    valid_sets=(test_dataset,),
    num_boost_round=10
)

# Make predictions
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

# Calculate and print R2 scores
print("\nR2 Test Score : %.2f" % r2_score(Y_test, test_preds))
print("R2 Train Score : %.2f" % r2_score(Y_train, train_preds))

输出

这将生成以下结果

Sizes of Train or Test Datasets :  (455, 13) (51, 13) (455,) (51,)

[1]	valid_0's rmse: 7.50225
[2]	valid_0's rmse: 7.01989
[3]	valid_0's rmse: 6.58246
[4]	valid_0's rmse: 6.18581
[5]	valid_0's rmse: 5.83873
[6]	valid_0's rmse: 5.47166
[7]	valid_0's rmse: 5.19667
[8]	valid_0's rmse: 4.96259
[9]	valid_0's rmse: 4.69168
[10]	valid_0's rmse: 4.51653

R2 Test Score : 0.67
R2 Train Score : 0.69

示例2

以下代码训练一个LightGBM模型,使用波士顿数据集预测房价。训练后,它将使用R2得分计算模型在训练数据和测试数据上的效果。

# Import necessary libraries
import lightgbm as lgb
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Load the Boston housing dataset
boston = load_boston()

# Split the dataset into training and testing sets
X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=42)

# Print the size of the training and testing sets
print("Sizes of Training and Testing Datasets : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

# Create a LightGBM model with interaction constraints and 10 estimators
booster = lgb.LGBMModel(objective="regression", n_estimators=10,
                        interaction_constraints=[[0,1,2,11,12], [3, 4], [6,10], [5,9], [7,8]])

# Train the model on the training set and validate it on the test set
booster.fit(X_train, Y_train,
            eval_set=[(X_test, Y_test)],
            eval_metric="rmse")

# Make predictions on both the test and training sets
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

# Calculate and print the R2 score for the test and training sets
print("\nR2 Test Score : %.2f" % r2_score(Y_test, test_preds))
print("R2 Train Score : %.2f" % r2_score(Y_train, train_preds))

输出

这将产生以下结果

Sizes of Training and Testing Datasets :  (379, 13) (127, 13) (379,) (127,)
[1]	valid_0's rmse: 8.97871	valid_0's l2: 80.6173
[2]	valid_0's rmse: 8.35545	valid_0's l2: 69.8135
[3]	valid_0's rmse: 7.93432	valid_0's l2: 62.9535
[4]	valid_0's rmse: 7.61104	valid_0's l2: 57.9279
[5]	valid_0's rmse: 7.16832	valid_0's l2: 51.3849
[6]	valid_0's rmse: 6.93182	valid_0's l2: 48.0501
[7]	valid_0's rmse: 6.57728	valid_0's l2: 43.2606
[8]	valid_0's rmse: 6.41497	valid_0's l2: 41.1518
[9]	valid_0's rmse: 6.13983	valid_0's l2: 37.6976
[10]	valid_0's rmse: 5.9864	valid_0's l2: 35.837

R2 Test Score : 0.60
R2 Train Score : 0.69
广告