如何使用 Python Scikit-learn 创建随机森林分类器？

Python Scikit-learn 服务器端编程编程

随机森林是一种监督式机器学习算法，通过在数据样本上创建决策树来用于分类、回归和其他任务。创建决策树后，随机森林分类器收集每个决策树的预测结果，并通过投票的方式选择最佳解决方案。

随机森林分类器的最大优势之一是它通过平均结果来减少过拟合。这就是为什么与单个决策树相比，我们能获得更好结果的原因。

创建随机森林分类器的步骤

我们可以按照以下步骤使用 Python Scikit-learn 创建随机森林分类器：

步骤 1 - 导入所需的库。

步骤 2 - 加载数据集。

步骤 3 - 将数据集划分为训练集和测试集。

步骤 4 - 从 sklearn.ensemble 模块导入随机森林分类器。

步骤 5 - 创建数据集的数据框。

步骤 6 - 创建随机森林分类器，并使用 fit() 函数训练模型。

步骤 7 - 对测试集进行预测。

步骤 8 - 导入指标以查找分类器的准确性。

步骤 9 - 打印随机森林分类器的准确性。

示例

在下面的示例中，我们将使用 Iris 植物数据集来构建随机森林分类器。

# Import required libraries
import sklearn
import pandas as pd
from sklearn import datasets

# Load the iris dataset from sklearn
iris_clf = datasets.load_iris()
print(iris_clf.target_names)
print(iris_clf.feature_names)

# Dividing the datasets into training datasets and test datasets
X, y = datasets.load_iris( return_X_y = True)
from sklearn.model_selection import train_test_split

# 60 % training dataset and 40 % test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40)

# Import random forest classifier from sklearn assemble module
from sklearn.ensemble import RandomForestClassifier

# Create dataframe
data = pd.DataFrame({'sepallength': iris_clf.data[:, 0],
'sepalwidth': iris_clf.data[:, 1],
'petallength': iris_clf.data[:, 2],
'petalwidth': iris_clf.data[:, 3],
'species': iris_clf.target})

# Create a Random Forest classifier
RForest_clf = RandomForestClassifier(n_estimators = 100)

# Train the model on the training dataset by using fit() function
RForest_clf.fit(X_train, y_train)

# Predict from the test dataset
y_pred = RForest_clf.predict(X_test)

# Import metrics for accuracy calculation
from sklearn import metrics
print('\n'"Accuracy of our Random Forst Classifier is: ",
metrics.accuracy_score(y_test, y_pred)*100)

输出

它将产生以下输出：

['setosa' 'versicolor' 'virginica']
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Accuracy of our Random Forst Classifier is: 95.0

让我们使用我们的分类器来预测花卉类型：

# Predicting the type of flower
RForest_clf.predict([[5, 4, 3, 1]])

输出

它将产生以下输出：

array([1])

数组([1]) 代表 versicolor 类型。

# Predicting the type of flower
RForest_clf.predict([[5, 4, 5, 2]])

输出

它将产生以下输出：

array([2])

这里数组([2]) 代表 virginica 类型。

Gaurav Leekha

更新于：2022年10月4日

1K+ 次查看

启动您的职业生涯

完成课程获得认证

开始学习