如何使用Python将文本数据嵌入到维度向量中?


Tensorflow是谷歌提供的机器学习框架。它是一个开源框架,与Python结合使用,用于实现算法、深度学习应用程序等等。它用于研究和生产目的。

Keras是作为ONEIROS项目(开放式神经电子智能机器人操作系统)研究的一部分开发的。Keras是一个深度学习API,是用Python编写的。它是一个高级API,具有高效的界面,有助于解决机器学习问题。它运行在Tensorflow框架之上。它旨在帮助快速进行实验。它提供了开发和封装机器学习解决方案所必需的基本抽象和构建块。

Keras已存在于Tensorflow包中。可以使用以下代码行访问它。

import tensorflow
from tensorflow import keras

Keras函数式API有助于创建比使用顺序API创建的模型更灵活的模型。函数式API可以处理具有非线性拓扑的模型,可以共享层,并可以处理多个输入和输出。深度学习模型通常是一个包含多个层的定向无环图 (DAG)。函数式API有助于构建图层图。

我们正在使用Google Colaboratory运行以下代码。Google Colab或Colaboratory有助于在浏览器上运行Python代码,无需任何配置,并可免费访问GPU(图形处理单元)。Colaboratory构建在Jupyter Notebook之上。以下是我们将标题中的每个单词嵌入到64维向量中的代码片段:

示例

print("Number of unique issue tags")
num_tags = 12
print("Size of vocabulary while preprocessing text data")
num_words = 10000
print("Number of classes for predictions")
num_classes = 4
title_input = keras.Input(
   shape=(None,), name="title"
)
print("Variable length int sequence")
body_input = keras.Input(shape=(None,), name="body")
tags_input = keras.Input(
   shape=(num_tags,), name="tags"
)
print("Embed every word in the title to a 64-dimensional vector")
title_features = layers.Embedding(num_words, 64)(title_input)
print("Embed every word into a 64-dimensional vector")
body_features = layers.Embedding(num_words, 64)(body_input)
print("Reduce sequence of embedded words into single 128-dimensional vector")
title_features = layers.LSTM(128)(title_features)
print("Reduce sequence of embedded words into single 132-dimensional vector")
body_features = layers.LSTM(32)(body_features)
print("Merge available features into a single vector by concatenating it")
x = layers.concatenate([title_features, body_features, tags_input])
print("Use logistic regression to predict the features")
priority_pred = layers.Dense(1, name="priority")(x)
department_pred = layers.Dense(num_classes, name="class")(x)
print("Instantiate a model that predicts priority and class")
model = keras.Model(
   inputs=[title_input, body_input, tags_input],
   outputs=[priority_pred, department_pred],
)

代码来源 − https://tensorflowcn.cn/guide/keras/functional

输出

Number of unique issue tags
Size of vocabulary while preprocessing text data
Number of classes for predictions
Variable length int sequence
Embed every word in the title to a 64-dimensional vector
Embed every word into a 64-dimensional vector
Reduce sequence of embedded words into single 128-dimensional vector
Reduce sequence of embedded words into single 132-dimensional vector
Merge available features into a single vector by concatenating it
Use logistic regression to predict the features
Instantiate a model that predicts priority and class

解释

  • 函数式API可用于处理多个输入和输出。

  • 顺序API无法做到这一点。

更新于:2021年1月18日

浏览量:157

启动你的职业生涯

完成课程获得认证

开始学习
广告