如何在 Python 中使用 TensorFlow 处理字符子串？

可以使用 TensorFlow 中 `strings` 模块的 `substr` 方法处理字符子串。然后将其转换为 NumPy 数组并显示。

我们将学习如何使用 Python 表示 Unicode 字符串，并使用 Unicode 等价物对其进行操作。首先，借助 Unicode 等价的标准字符串操作，根据脚本检测将 Unicode 字符串分成标记。

我们使用 Google Colaboratory 来运行以下代码。Google Colab 或 Colaboratory 帮助在浏览器上运行 Python 代码，无需任何配置，并可免费访问 GPU（图形处理单元）。Colaboratory 建立在 Jupyter Notebook 之上。

print("The default unit is byte")
print("When len is 1, a single byte is returned")
tf.strings.substr(thanks, pos=7, len=1).numpy()
print("The unit is specified as UTF8_CHAR")
print("It takes up 4 bytes")
print(tf.strings.substr(thanks, pos=7, len=1, unit='UTF8_CHAR').numpy())

代码来源： https://tensorflowcn.cn/tutorials/load_data/unicode

输出

The default unit is byte
When len is 1, a single byte is returned
The unit is specified as UTF8_CHAR
It takes up 4 bytes
b''

解释

tf.strings.substr 操作采用 "unit" 参数。
然后它使用此参数来确定 "pos" 和 "len" 参数将包含哪种类型的偏移量。

AmitDiwan

更新于： 2021年2月20日

156 次浏览

启动您的职业生涯

完成课程获得认证

开始学习