使用 Python -NLTK 创建一个基本的硬编码聊天机器人

什么是聊天机器人？

近年来，聊天机器人越来越受欢迎，它们可以自动化用户和软件平台之间的简单对话。聊天机器人能够响应用户输入，并理解自然语言输入。Python-NLTK（自然语言工具包）是一个功能强大的库，可用于执行自然语言处理 (NLP) 任务。在本教程中，我们将使用 Python-NLTK 创建一个简单的硬编码聊天机器人。

聊天机器人创建的核心概念是什么？

聊天机器人创建的核心概念包括：

自然语言处理 (NLP) - 聊天机器人使用 NLP 来理解人类语言并解释用户的意图。NLP 包括分词、词性标注和命名实体识别等技术。
对话管理 - 对话管理负责管理对话的流程并在对话的多个回合中保持上下文。
机器学习 - 机器学习用于训练聊天机器人识别数据中的模式、做出预测并随着时间的推移而改进。监督学习、无监督学习和强化学习等技术用于聊天机器人的开发。
API 和集成 - 聊天机器人通常需要与外部服务和 API 集成以提供信息或为用户完成任务。
用户体验 (UX) - 用户体验对于聊天机器人至关重要，因为它们应该易于使用且直观。UX 考虑因素包括设计对话流程、选择合适的响应类型以及向用户提供清晰且有帮助的反馈。

先决条件

在我们深入研究任务之前，需要在您的系统上安装一些内容：

推荐设置列表：

pip install pandas, matplotlib
预计用户将能够访问任何独立的 IDE，例如 VS-Code、PyCharm、Atom 或 Sublime text。
也可以使用在线 Python 编译器，例如 Kaggle.com、Google Cloud Platform 或任何其他编译器。
更新版本的 Python。在撰写本文时，我使用了 3.10.9 版本。
了解如何使用 Jupyter notebook。
虚拟环境的知识和应用将是有益的，但不是必需的。
还预计用户将对统计学和数学有很好的理解。
安装 Python-NLTK(http://www.nltk.org/install.html)。
熟悉文本处理（分词、词形还原、词干提取）。

安装所需的库

首先，我们需要安装开发聊天机器人所需的库。聊天机器人开发需要 NLTK、Regex、random 和 string 库。要安装这些库，我们可以使用 pip 命令。

!pip install nltk
!pip install regex
!pip install random
!pip install string

导入所需的库

安装必要的库后，我们需要在 Python notebook 中导入这些库。以下是导入这些库的代码。

import nltk
import re
import random
import string
from string import punctuation

数据预处理

安装并导入所需的包后，我们需要预处理数据。预处理包括删除所有不必要的数据，将数据分词成句子，以及删除停用词。停用词是在对话上下文中几乎没有意义或没有意义的最常见词语，例如“a”、“is”等。

# Download stopwords from nltk
nltk.download('punkt')
nltk.download('stopwords')
stop_words = set(nltk.corpus.stopwords.words('english'))

def sentence_tokenizer(data):
   # Function for Sentence Tokenization
   return nltk.sent_tokenize(data.lower())

def word_tokenizer(data):
   # Function for Word Tokenization
   return nltk.word_tokenize(data.lower())

def remove_noise(word_tokens):
   # Function to remove stop words and punctuation
   cleaned_tokens = []
   for token in word_tokens:
      if token not in stop_words and token not in punctuation:
         cleaned_tokens.append(token)
   return cleaned_tokens

构建聊天机器人

现在我们已经对数据进行了预处理，我们准备构建聊天机器人。聊天机器人的流程可以概括为以下步骤：

定义模式和响应列表
初始化一个无限 while 循环
让用户输入查询
对查询进行分词并删除停用词
将查询与其中一个模式匹配并返回响应。

# Define the Patterns and Responses
patterns = [
   (r'hi|hello|hey', ['Hi there!', 'Hello!', 'Hey!']),
   (r'bye|goodbye', ['Bye', 'Goodbye!']),
   (r'(\w+)', ['Yes, go on', 'Tell me more', 'I’m listening...']),
   (r'(\?)', ['I’m sorry, but I can’t answer that','Please ask me another question', 'I’m not sure what you mean.'])
]

# Function to generate response for the user input
def generate_response(user_input):
   # Append User Input to chat history
   conversation_history.append(user_input)
   # Generate Random response
   response = random.choice(responses)
   return response

# Main loop of chatbot
conversation_history = []
responses = [response for pattern, response in patterns]
while True:
   # User Input
   user_input = input("You: ")
   # End the Loop if the User Says Bye or Goodbye
   if user_input.lower() in ['bye', 'goodbye']:
      print('Chatbot: Goodbye!')
      break
   # Tokenize the User Input
   user_input_tokenized = word_tokenizer(user_input)
   # Remove Stop Words
   user_input_nostops = remove_noise(user_input_tokenized)
   # Process Query and Generate Response
   chatbot_response = generate_response(user_input_nostops)
   # Print Response
   print('Chatbot:', chatbot_response)

最终程序，代码

import nltk
import re
import random
import string

from string import punctuation

# Download stopwords from nltk
nltk.download('punkt')
nltk.download('stopwords')
stop_words = set(nltk.corpus.stopwords.words('english'))

def sentence_tokenizer(data):
   # Function for Sentence Tokenization
   return nltk.sent_tokenize(data.lower())

def word_tokenizer(data):
   # Function for Word Tokenization
   return nltk.word_tokenize(data.lower())

def remove_noise(word_tokens):
   # Function to remove stop words and punctuation
   cleaned_tokens = []
   for token in word_tokens:
      if token not in stop_words and token not in punctuation:
         cleaned_tokens.append(token)
   return cleaned_tokens

# Define the Patterns and Responses
patterns = [
   (r'hi|hello|hey', ['Hi there!', 'Hello!', 'Hey!']),
   (r'bye|goodbye', ['Bye', 'Goodbye!']),
   (r'(\w+)', ['Yes, go on', 'Tell me more', 'I’m listening...']),
   (r'(\?)', ['I’m sorry, but I can’t answer that', 'Please ask me another question', 'I’m not sure what you mean.'])
]

# Function to generate response for the user input
def generate_response(user_input):
   # Append User Input to chat history
   conversation_history.append(user_input)
   # Generate Random response
   response = random.choice(responses)
   return response

# Main loop of chatbot
conversation_history = []
responses = [response for pattern, response in patterns]
while True:
   # User Input
   user_input = input("You: ")
   # End the Loop if the User Says Bye or Goodbye
   if user_input.lower() in ['bye', 'goodbye']:
      print('Chatbot: Goodbye!')
      break
   # Tokenize the User Input
   user_input_tokenized = word_tokenizer(user_input)
   # Remove Stop Words
   user_input_nostops = remove_noise(user_input_tokenized)
   # Process Query and Generate Response
   chatbot_response = generate_response(user_input_nostops)
   # Print Response
   print('Chatbot:', chatbot_response)