- Python - 文本处理
- Python - 文本处理简介
- Python - 文本处理环境
- Python - 字符串不可变性
- Python - 行排序
- Python - 段落重新格式化
- Python - 计算段落中的标记
- Python - 二进制 ASCII 转换
- Python - 字符串作为文件
- Python - 向后文件读取
- Python - 过滤重复单词
- Python - 从文本中提取电子邮件
- Python - 从文本中提取 URL
- Python - 漂亮打印
- Python - 文本处理状态机
- Python - 大写和翻译
- Python - 分词
- Python - 去除停用词
- Python - 同义词和反义词
- Python - 文本翻译
- Python - 单词替换
- Python - 拼写检查
- Python - WordNet 接口
- Python - 语料库访问
- Python - 单词标记
- Python - 块和缺口
- Python - 块分类
- Python - 文本分类
- Python - 双词
- Python - 处理 PDF
- Python - 处理 Word 文档
- Python - 读取 RSS 源
- Python - 情绪分析
- Python - 搜索和匹配
- Python - 文本混淆
- Python - 文本包装
- Python - 频率分布
- Python - 文本摘要
- Python - 词干提取算法
- Python - 约束搜索
Python - 词干提取算法
在自然语言处理领域,我们遇到两种或两种以上单词具有相同词根的情况。例如,三个单词 - 同意、同意和令人愉悦具有相同的词根同意。涉及任何这些单词的搜索应将它们视为相同单词(词根)。因此,将所有单词链接到它们的词根变得至关重要。NLTK 库有方法执行此链接并提供显示词根的输出。
nltk 中有三个最常用的词干提取算法。它们给出的结果略有不同。以下示例展示了所有三个词干提取算法及其结果的使用方法。
import nltk from nltk.stem.porter import PorterStemmer from nltk.stem.lancaster import LancasterStemmer from nltk.stem import SnowballStemmer porter_stemmer = PorterStemmer() lanca_stemmer = LancasterStemmer() sb_stemmer = SnowballStemmer("english",) word_data = "Aging head of famous crime family decides to transfer his position to one of his subalterns" # First Word tokenization nltk_tokens = nltk.word_tokenize(word_data) #Next find the roots of the word print '***PorterStemmer****\n' for w_port in nltk_tokens: print "Actual: %s || Stem: %s" % (w_port,porter_stemmer.stem(w_port)) print '\n***LancasterStemmer****\n' for w_lanca in nltk_tokens: print "Actual: %s || Stem: %s" % (w_lanca,lanca_stemmer.stem(w_lanca)) print '\n***SnowballStemmer****\n' for w_snow in nltk_tokens: print "Actual: %s || Stem: %s" % (w_snow,sb_stemmer.stem(w_snow))
当我们运行上述程序时,我们将得到以下输出 -
***PorterStemmer**** Actual: Aging || Stem: age Actual: head || Stem: head Actual: of || Stem: of Actual: famous || Stem: famou Actual: crime || Stem: crime Actual: family || Stem: famili Actual: decides || Stem: decid Actual: to || Stem: to Actual: transfer || Stem: transfer Actual: his || Stem: hi Actual: position || Stem: posit Actual: to || Stem: to Actual: one || Stem: one Actual: of || Stem: of Actual: his || Stem: hi Actual: subalterns || Stem: subaltern ***LancasterStemmer**** Actual: Aging || Stem: ag Actual: head || Stem: head Actual: of || Stem: of Actual: famous || Stem: fam Actual: crime || Stem: crim Actual: family || Stem: famy Actual: decides || Stem: decid Actual: to || Stem: to Actual: transfer || Stem: transf Actual: his || Stem: his Actual: position || Stem: posit Actual: to || Stem: to Actual: one || Stem: on Actual: of || Stem: of Actual: his || Stem: his Actual: subalterns || Stem: subaltern ***SnowballStemmer**** Actual: Aging || Stem: age Actual: head || Stem: head Actual: of || Stem: of Actual: famous || Stem: famous Actual: crime || Stem: crime Actual: family || Stem: famili Actual: decides || Stem: decid Actual: to || Stem: to Actual: transfer || Stem: transfer Actual: his || Stem: his Actual: position || Stem: posit Actual: to || Stem: to Actual: one || Stem: one Actual: of || Stem: of Actual: his || Stem: his Actual: subalterns || Stem: subaltern
广告