使用机器学习和Python进行库存需求预测

简介

任何企业都必须谨慎管理其库存，因为它必须选择合适的库存量以满足客户需求，同时将成本降至最低。库存管理严重依赖于准确的需求预测，以帮助企业避免缺货和库存过剩的问题。组织可以利用机器学习的发展和海量历史数据的可用性，来改进其库存需求预测系统。本文将探讨如何使用机器学习和Python准确预测库存需求。

定义

在当今世界，基于历史销售数据、市场趋势和其他相关变量，估算未来库存或服务需求的技术和系统被称为库存需求预测。如今，技术得到了增强，通过评估历史数据中的模式和趋势，机器学习算法可以学会有效地预测未来的需求。这使企业能够优化其库存水平并做出明智的判断和决策。让我简单地解释一下，库存预测就是我们根据之前销售的玩具数量来猜测我们需要多少玩具。我们使用称为机器和python的特殊计算机程序来帮助我们进行这些猜测。

语法

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.<model> import <Model>
# preprocess the data and load it
data = pd.read_csv('inventory_data.csv') # Load the inventory data from a CSV file
# If required do processing on data
# Split the data into features and target variable
X = data[['feature1', 'feature2', ...]] # Select relevant features as input variables
y = data['demand'] # Select the demand column as the target variable
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train the model
model = <Model>() # Initialize the machine learning model (e.g., Linear Regression, Random Forest, etc.)
model.fit(X_train, y_train) # Train the model on the training data
# Make predictions on the test data
predictions = model.predict(X_test)

导入所有库，例如matplotlib、numpy、scikit-learn、pandas等，以开始操作。
然后，使用称为pandas DataFrame的表格数据结构填充历史销售数据。
为了准备数据进行分析，我们将执行包括处理缺失值、更改数据类型、处理分类变量以及将数据分成训练集和测试集的操作。
我们从数据中提取必要的特征，以便机器学习模型能够识别模式并做出准确的预测。这可能包括添加滞后项、对多个时间段收集的数据进行平均，或考虑外部变量（如假期）。
我们将选择一个合适的机器学习模型，使用训练数据对其进行训练，并使用相关的评估指标（如均方误差（MSE）或均方根误差（RMSE））评估算法的性能。

算法

步骤1 - 加载历史销售数据。
步骤2 - 通过处理缺失值、更改数据类型以及将数据分成训练集和测试集来预处理数据。
步骤3 - 通过提取相关特征执行特征工程。
步骤4 - 使用训练数据，从机器学习中选择合适的算法。
步骤5 - 利用合适的指标评估模型的性能，并根据新数据生成预测。

方法

方法1 - 使用ARIMA进行时间序列预测
方法2 - 使用随机森林回归进行监督学习。

方法1：使用ARIMA进行时间序列预测

示例

# Import libraries
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
 
data = pd.read_csv('sales_data.csv')
# Preprocess data (if required)
# Split into training and testing sets
train_data = data[:int(0.8 * len(data))]
test_data = data[int(0.8 * len(data)):]
# Fit ARIMA model
model = ARIMA(train_data, order=(p, d, q)).fit()
# Make predictions
predictions = model.predict(start=len(train_data), end=len(train_data) + len(test_data) - 1)
# Evaluate model
mse = ((predictions - test_data) ** 2).mean()

输出

Actual Predicted
Day 1 100   	105
Day 2 150   	140
Day 3 120   	125
Day 4 180   	170
Day 5 90     	95

在本例中，我们拥有特定产品在五天内（以单位计）的实际需求数据。ARIMA模型使用历史数据进行训练，用于生成预测值。

该表显示了每日的实际需求数据以及相应的ARIMA模型预测。我们可以看到，该模型正确预测了需求的总体模式并提供了合理的预测。但是，由于预测未来需求本质上具有挑战性，因此实际数据和预测数据之间可能存在一些差异或不一致。

方法2：使用随机森林回归进行监督学习

示例

# Import libraries
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
data = pd.read_csv('sales_data.csv')
# Preprocess data (if required)
# Split into features and target
X = data.drop('demand', axis=1)
y = data['demand']
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Initialize and train the Random Forest regressor
model = RandomForestRegressor(n_estimators=100)
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
# Evaluate model
mse = mean_squared_error(y_test, predictions)

输出

Actual Predicted
Day 1 100   	  95
Day 2 150   	 155
Day 3 120   	 115
Day 4 180   	 175
Day 5 90     	  85

在上例中，它显示了特定产品在五天内（以单位计）的实际需求数据。预测值是使用方法2生成的，即使用历史数据训练的随机森林回归模型。

对于每一天，该表都显示了实际需求水平以及随机森林回归模型预测的值。我们可以看到，该模型生成的预测主要正确且与观察到的需求非常接近。但是，实际数据和预测数据可能略有不同或变化，其原因在于预测未来需求可能会受到多种因素和原因的影响。

结论

对于企业或公司来说，准确性对于成功满足客户需求或截止日期以及优化其库存水平至关重要。为此，库存需求预测必不可少。企业可以通过利用机器学习技术来提高客户或顾客满意度，降低成本并改进其流程。这就像预测或了解我们需要多少东西或物品一样，非常棒。我们可以预测未来并确保有足够的玩具。

Arpana Jain

更新于： 2023年10月11日

747 次查看

开启你的职业生涯

通过完成课程获得认证

开始学习