使用迁移学习的多类别图像分类

简介

基于图像数据的深度学习中最常见的任务之一是图像分类。由于新的高性能机器学习框架的开发，图像分类在研究领域变得越来越有趣。这种分类可以是二元分类，其中存在两类图像，也可以是多类别分类，它处理超过两类图像。在这里，在本文中，我们将探索使用迁移学习进行多类别图像分类。

多类别图像分类

随着人工神经网络的进步以及卷积神经网络的开发，对图像进行复杂操作变得容易，并促进了多类别图像分类、图像分割和图像检测等任务的增长和发展。

多类别图像分类是使用 CNN 网络可以执行的最基本但功能强大的计算机视觉任务之一。在这种方法中，我们有多于两类的图像，根据其类别进行标记（例如，CIFAR、Fashion MNIST）。

为了进行分类，我们可以准备我们自己的标记数据集，或者下载现有的图像数据集，如 CIFAR 10。如果每个类别的图像数量较少，预处理技术可能包括图像增强等任务，以增加图像数据集的变化。

为了训练模型，我们可以使用任何深度学习框架（如 TensorFlow 或 Pytorch 等）从头开始构建模型架构，或者使用现成的骨干架构（如 VGG16、Resnet 等）。后者的优势在于我们不必从头开始构建架构，而只需要专注于微调模型或根据我们的用例更改最后 1 或 2 层。这就是迁移学习发挥作用的地方，它是一种非常直观的训练图像模型的技术。

什么是迁移学习，为什么它很重要？

迁移学习是机器学习领域的一个研究问题。它存储解决一个问题时获得的知识，并将其应用于另一个但相关的问题。例如，在学习识别猫时获得的知识可以应用于尝试识别猎豹时。在深度学习中，迁移学习是一种技术，其中神经网络模型首先在一个类似于正在解决的问题的问题上进行训练。迁移学习具有缩短学习模型训练时间的优势，并且可以导致更低的泛化误差。我们可以使用在其他数据集（如 ImageNet）上训练的预训练模型，并修改最后一层以满足我们任务的目的。在这种情况下，我们可以节省训练从头开始构建模型的时间、精力和资源。这些经过训练的模型拥有大量图像模式和信息，这些信息来自对已训练图像的严格训练。

代码实现

在这个例子中，我们将使用 CIFAR-10 数据集进行多类别分类。我们还将使用 VGG19 网络并对其进行修改以进行迁移学习。

使用的数据集

数据集是 CIFRAR 10。它来自加拿大高级研究所 (CIFAR)。它包含 60000 张 32×32 彩色图像，分为 10 类，每类 6000 张图像。这 10 个不同的类别代表飞机、汽车、鸟类、猫、鹿、狗、青蛙、马、船和卡车。此数据集中有 50000 张训练图像和 10000 张测试图像。该数据集可以从 Keras 中导入。

使用 Keras API 实现

示例

import numpy as np
import pandas as pd
from sklearn.utils.multiclass import unique_labels
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
import itertools
from keras.datasets import cifar10
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from keras import Sequential
from keras.applications import VGG19 #For Transfer Learning
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD,Adam
from keras.callbacks import ReduceLROnPlateau
from keras.layers import Flatten,Dense,BatchNormalization,Activation,Dropout
from keras.utils import to_categorical

# Download the CIFAR dataset
(x_train,y_train),(x_test,y_test) = cifar10.load_data()

#defining training and test sets
x_train,x_val,y_train,y_val=train_test_split(x_train,y_train,test_size=.3)

#Dimension of the dataset
print((x_train.shape,y_train.shape))
print((x_val.shape,y_val.shape))
print((x_test.shape,y_test.shape))

#One Hot Encoding
y_train=to_categorical(y_train)
y_val=to_categorical(y_val)
y_test=to_categorical(y_test)

#Verifying the dimension after one hot encoding
print((x_train.shape,y_train.shape))
print((x_val.shape,y_val.shape))
print((x_test.shape,y_test.shape))

#Image Data Augmentation
train_generator = ImageDataGenerator(rotation_range=2, horizontal_flip=True, zoom_range=.1)
val_generator = ImageDataGenerator(rotation_range=2, horizontal_flip=True, zoom_range=.1)
test_generator = ImageDataGenerator(rotation_range=2, horizontal_flip= True, zoom_range=.1)

#Fitting the augmentation defined above to the data
train_generator.fit(x_train)
val_generator.fit(x_val)
test_generator.fit(x_test)

#Learning Rate Annealer
lrr= ReduceLROnPlateau(monitor='val_acc', factor=.01, patience=3, min_lr=1e-5)

#Defining the VGG Convolutional Neural Net
base_model = VGG19(include_top = False, weights = 'imagenet', input_shape = (32,32,3), classes = y_train.shape[1])

#Adding the final layers to the above base models where the actual classification is done in the dense layers
model= Sequential()
model.add(base_model)
model.add(Flatten())

#Model summary
model.summary()

#Adding the Dense layers along with activation and batch normalization
model.add(Dense(1024,activation=('relu'),input_dim=512))
model.add(Dense(512,activation=('relu')))
model.add(Dense(256,activation=('relu')))
model.add(Dropout(.3))
model.add(Dense(128,activation=('relu')))

#model.add(Dropout(.2))
model.add(Dense(10,activation=('softmax')))

#Checking the final model summary
model.summary()

#Making prediction
predict_y = model.predict(x_test)
y_pred=np.argmax(predict_y,axis=1)
y_true=np.argmax(y_test,axis=1)

输出

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170498071/170498071 [==============================] - 2s 0us/step
((35000, 32, 32, 3), (35000, 1))
((15000, 32, 32, 3), (15000, 1))
((10000, 32, 32, 3), (10000, 1))
((35000, 32, 32, 3), (35000, 10))
((15000, 32, 32, 3), (15000, 10))
((10000, 32, 32, 3), (10000, 10))
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5
80134624/80134624 [==============================] - 1s 0us/step
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg19 (Functional) (None, 1, 1, 512) 20024384
flatten (Flatten) (None, 512) 0
=================================================================
Total params: 20,024,384
Trainable params: 20,024,384
Non-trainable params: 0
_________________________________________________________________
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg19 (Functional) (None, 1, 1, 512) 20024384
flatten (Flatten) (None, 512) 0
dense (Dense) (None, 1024) 525312
dense_1 (Dense) (None, 512) 524800
dense_2 (Dense) (None, 256) 131328
dropout (Dropout) (None, 256) 0
dense_3 (Dense) (None, 128) 32896
dense_4 (Dense) (None, 10) 1290
=================================================================
Total params: 21,240,010
Trainable params: 21,240,010
Non-trainable params: 0
_________________________________________________________________
313/313 [==============================] - 158s 503ms/step

结论

多类别图像分类已被证明对深度学习界非常有益。作为计算机视觉中最重要的一些基本任务之一，它被广泛应用于 AI 行业作为基础任务，即使对于复杂的计算机视觉应用（如图像分割、检测和视觉识别任务）也是如此。

Mithilesh Pradhan

更新于： 2022 年 12 月 1 日

2K+ 次查看

开启你的职业生涯

通过完成课程获得认证

开始学习