如何使用TensorFlow和Python对Stack Overflow问题数据集进行文本向量化？

TensorFlow 是 Google 提供的一个机器学习框架。它是一个开源框架，与 Python 结合使用以实现算法、深度学习应用程序等等。它被用于研究和生产目的。

可以使用以下代码行在 Windows 上安装“tensorflow”包：

pip install tensorflow

张量是 TensorFlow 中使用的数据结构。它有助于连接流图中的边。此流图称为“数据流图”。张量只不过是多维数组或列表。

我们正在使用 Google Colaboratory 来运行以下代码。Google Colab 或 Colaboratory 帮助通过浏览器运行 Python 代码，并且需要零配置并免费访问 GPU（图形处理单元）。Colaboratory 建立在 Jupyter Notebook 之上。

示例

以下是代码片段：

print("1234 ---> ", int_vectorize_layer.get_vocabulary()[1289])
print("321 ---> ", int_vectorize_layer.get_vocabulary()[313])
print("Vocabulary size is : {}".format(len(int_vectorize_layer.get_vocabulary())))

print("The text vectorization is applied to the training dataset")
binary_train_ds = raw_train_ds.map(binary_vectorize_text)
print("The text vectorization is applied to the validation dataset")
binary_val_ds = raw_val_ds.map(binary_vectorize_text)
print("The text vectorization is applied to the test dataset")
binary_test_ds = raw_test_ds.map(binary_vectorize_text)

int_train_ds = raw_train_ds.map(int_vectorize_text)
int_val_ds = raw_val_ds.map(int_vectorize_text)
int_test_ds = raw_test_ds.map(int_vectorize_text)

代码来源 - https://tensorflowcn.cn/tutorials/load_data/text

输出

1234 ---> substring
321 ---> 20
Vocabulary size is : 10000
The text vectorization is applied to the training dataset
The text vectorization is applied to the validation dataset
The text vectorization is applied to the test dataset

解释

作为最终的预处理步骤，“TextVectorization”层应用于训练数据、测试数据和验证数据集。

AmitDiwan

更新于：2021年1月18日

146 次浏览

开启你的职业生涯

通过完成课程获得认证

开始学习

如何使用TensorFlow和Python对Stack Overflow问题数据集进行文本向量化？

示例

输出

解释

开启你的 职业生涯

开启你的职业生涯