如何使用Python和TensorFlow加载包含Stack Overflow问题的dataset?
TensorFlow是Google提供的机器学习框架。它是一个开源框架,与Python结合使用,可以实现算法、深度学习应用程序等等。它用于研究和生产目的。它具有优化技术,有助于快速执行复杂的数学运算。
这是因为它使用NumPy和多维数组。这些多维数组也称为“张量”。该框架支持使用深度神经网络。它具有高度可扩展性,并附带许多流行的dataset。它使用GPU计算并自动管理资源。它附带大量的机器学习库,并且得到良好的支持和文档记录。该框架能够运行深度神经网络模型、训练它们以及创建预测相应dataset相关特征的应用程序。
可以使用以下代码行在Windows上安装“tensorflow”包:
pip install tensorflow
我们使用Google Colaboratory来运行以下代码。Google Colab或Colaboratory帮助通过浏览器运行Python代码,无需任何配置即可免费访问GPU(图形处理单元)。Collaboratory构建在Jupyter Notebook之上。以下是使用Python加载包含Stack Overflow问题的dataset的代码片段:
示例
batch_size = 32
seed = 42
print("The training parameters have been defined")
raw_train_ds = preprocessing.text_dataset_from_directory(
train_dir,
batch_size=batch_size,
validation_split=0.25,
subset='training',
seed=seed)
for text_batch, label_batch in raw_train_ds.take(1):
for i in range(10):
print("Question: ", text_batch.numpy()[i][:100], '...')
print("Label:", label_batch.numpy()[i])代码来源:https://tensorflowcn.cn/tutorials/load_data/text
输出
The training parameters have been defined Found 8000 files belonging to 4 classes. Using 6000 files for training. Question: b'"my tester is going to the wrong constructor i am new to programming so if i ask a question that can' ... Label: 1 Question: b'"blank code slow skin detection this code changes the color space to lab and using a threshold finds' ... Label: 3 Question: b'"option and validation in blank i want to add a new option on my system where i want to add two text' ... Label: 1 Question: b'"exception: dynamic sql generation for the updatecommand is not supported against a selectcommand th' ... Label: 0 Question: b'"parameter with question mark and super in blank, i\'ve come across a method that is formatted like t' ... Label: 1 Question: b'call two objects wsdl the first time i got a very strange wsdl. ..i would like to call the object (i' ... Label: 0 Question: b'how to correctly make the icon for systemtray in blank using icon sizes of any dimension for systemt' ... Label: 0 Question: b'"is there a way to check a variable that exists in a different script than the original one? i\'m try' ... Label: 3 Question: b'"blank control flow i made a number which asks for 2 numbers with blank and responds with the corre' ... Label: 0 Question: b'"credentials cannot be used for ntlm authentication i am getting org.apache.commons.httpclient.auth.' ... Label: 1
解释
数据从磁盘加载,并准备成适合训练的形式。
“text_dataset_from_dataset”实用程序用于创建带标签的dataset。
“tf.Data”是一组功能强大的工具,用于构建输入管道。
目录结构传递给“text_dataset_from_dataset”实用程序。
Stack Overflow问题dataset被分成训练dataset和测试dataset。
使用“validation_split”方法创建验证集。
标签为0、1、2或3。
广告
数据结构
网络
关系数据库管理系统(RDBMS)
操作系统
Java
iOS
HTML
CSS
Android
Python
C语言编程
C++
C#
MongoDB
MySQL
Javascript
PHP