TensorFlow 中的 CIFAR-10 图像分类

图像分类是计算机视觉中一项基本任务，它涉及根据图像内容识别和分类图像。CIFAR-10 是一个众所周知的包含 60,000 张 32×32 彩色图像的数据集，分为 10 个类别，每个类别包含 6,000 张图像。

TensorFlow 是一个强大的框架，它提供了各种工具和 API 用于构建和训练机器学习模型。它广泛用于深度学习应用，并且拥有庞大的开发者社区参与其开发。TensorFlow 提供了一个名为 Keras 的高级 API，它使构建和训练深度神经网络变得容易。

在本教程中，我们将探讨如何使用 TensorFlow（一个流行的开源机器学习框架）对 CIFAR-10 执行图像分类。

加载数据

任何机器学习项目的第一个步骤都是准备数据。在本例中，我们将使用 CIFAR-10 数据集，它可以使用 TensorFlow 的内置数据集模块轻松下载。

让我们从导入必要的模块开始 -

import tensorflow as tf
from tensorflow.keras.datasets import cifar10

接下来，我们可以使用 cifar10 模块中的 load_data() 函数加载 CIFAR-10 数据集 -

# Load the CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

此代码将训练和测试图像及其相应的标签加载到四个 NumPy 数组中。train_images 和 test_images 数组包含图像本身，而 train_labels 和 test_labels 数组包含相应的标签（即，从 0 到 9 的整数，表示 10 个类别）。

始终建议可视化数据集中的几个示例，以便了解我们正在处理的内容 -

import matplotlib.pyplot as plt
import numpy as np

# Define the class names for visualization purposes
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer','dog', 'frog', 'horse', 'ship', 'truck']

# Plot a few examples
plt.figure(figsize=(10,10))
for i in range(25):
   plt.subplot(5,5,i+1)
   plt.xticks([])
   plt.yticks([])
   plt.grid(False)
   plt.imshow(train_images[i], cmap=plt.cm.binary)
   plt.xlabel(class_names[train_labels[i][0]])
plt.show()

这将显示训练集中 25 张图像的网格，以及它们相应的标签。

数据预处理

在我们可以使用 CIFAR-10 数据集训练模型之前，我们需要预处理数据。我们需要采取两个主要预处理步骤 -

规范化像素值

图像中的像素值范围从 0 到 255。通过将这些值缩放到 0 到 1 的范围内，我们可以提高模型的训练性能 -

# Normalize the pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

对标签进行独热编码

CIFAR-10 数据集中的标签是从 0 到 9 的整数。但是，为了训练模型对图像进行分类，我们需要将这些整数转换为独热编码向量。TensorFlow 提供了一个方便的函数来执行此操作 -

train_labels = tf.keras.utils.to_categorical(train_labels)
test_labels = tf.keras.utils.to_categorical(test_labels)

构建模型

现在我们已经预处理了数据，我们可以开始构建我们的模型了。我们将使用卷积神经网络 (CNN)，这是一种特别适合图像分类任务的神经网络类型。

以下是我们将用于 CIFAR-10 模型的架构 -

卷积层 - 我们将从两个卷积层开始，每个卷积层后面跟着一个最大池化层。卷积层的目的是从输入图像中学习特征，而最大池化层则对卷积层的输出进行下采样。
扁平化层 - 然后我们将卷积层的输出扁平化为一个一维向量，该向量将传递给全连接层。
全连接层 - 我们将使用两个全连接层，每个层包含 512 个神经元和一个 ReLU 激活函数。全连接层的目的是根据卷积层学习的特征学习类别概率。
输出层 - 最后，我们将添加一个包含 10 个神经元的输出层（每个类别一个），以及一个 softmax 激活函数，它将生成最终的类别概率。

以下是构建此模型的代码 -

# Define the CNN model
model = tf.keras.models.Sequential([
   tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
   tf.keras.layers.MaxPooling2D((2, 2)),
   tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
   tf.keras.layers.MaxPooling2D((2, 2)),
   tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
   tf.keras.layers.Flatten(),
   tf.keras.layers.Dense(64, activation='relu'),
   tf.keras.layers.Dense(10, activation='softmax')
])

编译和训练模型

现在我们已经定义了模型，我们需要对其进行编译并在 CIFAR-10 数据集上进行训练。我们将使用 compile() 方法指定训练期间使用的损失函数、优化器和指标 -

以下是编译模型的代码 -

# Compile the model
model.compile(optimizer='adam',
   loss='categorical_crossentropy',
   metrics=['accuracy'])

我们使用 adam 优化器，这是一种流行的随机梯度下降 (SGD) 变体，它在训练期间自适应地调整学习率。我们还使用 categorical_crossentropy 损失函数，这是多类分类问题的常用选择。最后，我们指定准确率指标，该指标将用于评估训练期间模型的性能。

要训练模型，我们只需调用 fit 方法并传入训练数据和标签 -

# Train the model
history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

在上面的代码中，我们使用训练数据训练模型 10 个 epoch，并在测试数据上对其进行验证。`fit()` 方法返回一个 `History` 对象，其中包含有关训练过程的信息，例如每个 epoch 的损失和准确率值。

以下是包含有关训练过程信息的输出 -

Epoch 1/10
1563/1563 [==============================] - 55s 34ms/step - loss: 1.7739 - accuracy: 0.3845 - val_loss: 1.4289 - val_accuracy: 0.4986
Epoch 2/10
1563/1563 [==============================] - 62s 40ms/step - loss: 1.2955 - accuracy: 0.5384 - val_loss: 1.2574 - val_accuracy: 0.5585
Epoch 3/10
1563/1563 [==============================] - 57s 36ms/step - loss: 1.1365 - accuracy: 0.6024 - val_loss: 1.1261 - val_accuracy: 0.6079
Epoch 4/10
1563/1563 [==============================] - 56s 36ms/step - loss: 1.0434 - accuracy: 0.6355 - val_loss: 1.0228 - val_accuracy: 0.6490
Epoch 5/10
1563/1563 [==============================] - 57s 36ms/step - loss: 0.9579 - accuracy: 0.6663 - val_loss: 1.0293 - val_accuracy: 0.6466
Epoch 6/10
1563/1563 [==============================] - 56s 36ms/step - loss: 0.8967 - accuracy: 0.6868 - val_loss: 1.0676 - val_accuracy: 0.6463
Epoch 7/10
1563/1563 [==============================] - 50s 32ms/step - loss: 0.8372 - accuracy: 0.7088 - val_loss: 1.0286 - val_accuracy: 0.6571
Epoch 8/10
1563/1563 [==============================] - 56s 36ms/step - loss: 0.7923 - accuracy: 0.7266 - val_loss: 1.0569 - val_accuracy: 0.6498
Epoch 9/10
1563/1563 [==============================] - 50s 32ms/step - loss: 0.7490 - accuracy: 0.7413 - val_loss: 1.0367 - val_accuracy: 0.6585
Epoch 10/10
1563/1563 [==============================] - 59s 38ms/step - loss: 0.7065 - accuracy: 0.7548 - val_loss: 1.0404 - val_accuracy: 0.6713

评估模型

训练模型后，我们可以使用 evaluate 方法评估其在测试集上的性能 -

# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)

这将打印模型的测试准确率，这表明它在分类从未见过的图像方面的表现如何。

我们还可以使用 Matplotlib 可视化随时间推移的训练和验证准确率 -

# Plot the training and validation accuracy over time
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label='val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')
plt.show()

以下是准确率曲线可能的样子示例 -

313/313 [==============================] - 3s 8ms/step - loss: 1.0404 - accuracy: 0.6713
Test accuracy: 0.6712999939918518

这将显示训练 10 个 epoch 期间训练和验证准确率的图表。我们可以看到，我们的模型实现了大约 75% 的训练准确率和大约 67% 的验证准确率，考虑到 CIFAR-10 数据集的小尺寸，这还不错。

做出预测

训练和评估模型后，我们可以使用它对新图像进行预测。以下是如何进行预测的示例 -

# Load a new image
new_image = plt.imread(r'C:\Users\Leekha\Desktop\sparrow.jpg')
new_image = tf.image.resize(new_image, (32, 32))

# Reshape the image to match the input shape of the model
new_image = np.expand_dims(new_image, axis=0)

# Make a prediction
predictions = model.predict(new_image)

# Get the index of the predicted class
predicted_class_index = np.argmax(predictions)

# Map the index to the corresponding class name
predicted_class_name = class_names[predicted_class_index]

# Print the predicted class name
print('Predicted class:', predicted_class_name)

它将给出以下预测 -

1/1 [==============================] - 0s 32ms/step
Predicted class: bird

让我们从训练好的模型中再做一个预测 -

# Load a new image
new_image = plt.imread(r'C:\Users\Leekha\Desktop\car.jpg')
new_image = tf.image.resize(new_image, (32, 32))

# Reshape the image to match the input shape of the model
new_image = np.expand_dims(new_image, axis=0)

# Make a prediction
predictions = model.predict(new_image)

# Get the index of the predicted class
predicted_class_index = np.argmax(predictions)

# Map the index to the corresponding class name
predicted_class_name = class_names[predicted_class_index]

# Print the predicted class name
print('Predicted class:', predicted_class_name)

它将给出以下预测 -

1/1 [==============================] - 0s 19ms/step
Predicted class: automobile

在上面的代码块中，我们首先使用 plt.imread 加载新图像，并将其调整大小以匹配模型的输入形状。然后，我们将图像的维度扩展以匹配模型的批次大小。

最后，我们使用模型的 predict 方法获取图像的预测类别概率。我们使用 np.argmax 查找预测类别的索引，然后在 class_names 列表中查找相应的类名。然后将预测的类名打印到控制台。

结论

在本文中，我们探讨了如何使用 TensorFlow 和 Keras 对 CIFAR-10 数据集执行图像分类。我们构建了一个卷积神经网络 (CNN)，并在 CIFAR-10 数据集上对其进行了训练，实现了大约 67% 的测试准确率。我们还使用 Matplotlib 可视化了随时间推移的训练和验证准确率。

Gaurav Leekha

更新于： 2024 年 2 月 20 日

428 次查看

开启您的职业生涯

通过完成课程获得认证

开始学习