如何使用 TensorFlow 和 Python 从单词列表构建参差不齐张量？

可以使用句子中单词的起始偏移量来构建 RaggedTensor。首先，构建句子中每个单词中每个字符的代码点。接下来，将它们显示在控制台上。确定特定句子中的单词数量，并确定偏移量。

使用 Python 表示 Unicode 字符串，并使用 Unicode 等价物操作它们。首先，我们将使用 Unicode 等价的标准字符串操作，根据脚本检测将 Unicode 字符串分成标记。

我们使用 Google Colaboratory 来运行以下代码。Google Colab 或 Colaboratory 帮助在浏览器上运行 Python 代码，无需任何配置，并可免费访问 GPU（图形处理单元）。Colaboratory 建立在 Jupyter Notebook 之上。

print("Get the code point of every character in every word")
word_char_codepoint = tf.RaggedTensor.from_row_starts(
   values=sentence_char_codepoint.values,
   row_starts=word_starts)
print(word_char_codepoint)
print("Get the number of words in the specific sentence")
sentence_num_words = tf.reduce_sum(tf.cast(sentence_char_starts_word, tf.int64), axis=1)

代码来源： https://tensorflowcn.cn/tutorials/load_data/unicode

输出

Get the code point of every character in every word
<tf.RaggedTensor [[72, 101, 108, 108, 111], [44, 32], [116, 104, 101, 114, 101], [46], [19990, 30028], [12371, 12435, 12395, 12385, 12399]]>
Get the number of words in the specific sentence

解释

构建每个单词中每个字符的代码点。
这些代码点显示在控制台上。
确定特定句子中的单词数量。

AmitDiwan

更新于： 2021年2月20日

256 次浏览

启动你的职业生涯

完成课程获得认证

开始学习