如何使用 Python 和 TensorFlow 迭代数据集并显示样本数据？

TensorFlow 是 Google 提供的一个机器学习框架。它是一个开源框架，与 Python 结合使用可以实现算法、深度学习应用程序等等。它用于研究和生产目的。它具有优化技术，有助于快速执行复杂的数学运算。这是因为它使用了 NumPy 和多维数组。这些多维数组也称为“张量”。该框架支持使用深度神经网络。它具有高度的可扩展性，并附带许多流行的数据集。它使用 GPU 计算并自动管理资源。它附带大量的机器学习库，并得到良好的支持和文档记录。该框架能够运行深度神经网络模型、训练它们以及创建预测各个数据集相关特征的应用程序。

可以使用以下代码行在 Windows 上安装 “tensorflow” 包：

pip install tensorflow

张量是 TensorFlow 中使用的数据结构。它有助于连接数据流图中的边。这个数据流图被称为“数据流图”。张量只不过是一个多维数组或列表。它们可以使用三个主要属性来标识：

秩 − 它说明张量的维度。可以理解为张量的阶数或已定义张量的维度数。
类型 − 它说明与张量元素相关联的数据类型。它可以是一维、二维或 n 维张量。
形状 − 它是行数和列数的组合。

我们使用 Google Colaboratory 来运行以下代码。Google Colab 或 Colaboratory 帮助在浏览器上运行 Python 代码，无需任何配置即可免费访问 GPU（图形处理单元）。Colaboratory 建立在 Jupyter Notebook 之上。

示例

print("Iterating through the training data")
for i, label in enumerate(raw_train_ds.class_names):
   print("Label", i, "maps to", label)
print("The training parameters have been defined")
raw_val_ds = preprocessing.text_dataset_from_directory(
   train_dir,
   batch_size=batch_size,
   validation_split=0.25,
   subset='validation',
   seed=seed)
print("The test dataset is being prepared")
test_dir = dataset_dir/'test'
raw_test_ds = preprocessing.text_dataset_from_directory(
   test_dir, batch_size=batch_size)

代码来源 − https://tensorflowcn.cn/tutorials/load_data/text

输出

Iterating through the training data
Label 0 maps to csharp
Label 1 maps to java
Label 2 maps to javascript
Label 3 maps to python
The training parameters have been defined
Found 8000 files belonging to 4 classes.
Using 2000 files for validation.
The test dataset is being prepared
Found 8000 files belonging to 4 classes.

解释

迭代训练数据。
在控制台上显示训练集、测试集和验证集的行数。
使用 “text_dataset_from_directory” 实用程序预处理数据。

AmitDiwan

更新于： 2021年1月18日

294 次浏览

启动你的职业生涯

完成课程获得认证

开始学习