- Python - 文本处理
- Python - 文本处理简介
- Python - 文本处理环境
- Python - 字符串不可变性
- Python - 排序行
- Python - 重新格式化段落
- Python - 统计段落中的标记
- Python - 二进制 ASCII 转换
- Python - 字符串作为文件
- Python - 向后读取文件
- Python - 过滤重复的单词
- Python - 从文本中提取电子邮件
- Python - 从文本中提取 URL
- Python - 漂亮打印
- Python - 文本处理状态机
- Python - 大写和翻译
- Python - 令牌化
- Python - 删除停用词
- Python - 同义词和反义词
- Python - 文本翻译
- Python - 单词替换
- Python - 拼写检查
- Python - WordNet 接口
- Python - 文集访问
- Python - 标记单词
- Python - 块和缺口
- Python - 块分类
- Python - 文本分类
- Python - 二元语法
- Python - 处理 PDF
- Python - 处理 Word 文档
- Python - 阅读 RSS 订阅
- Python - 情感分析
- Python - 搜索和匹配
- Python - 文本混淆
- Python - 文本换行
- Python - 频率分布
- Python - 文本摘要
- Python - 词干算法
- Python - 受限搜索
Python - 文本摘要
文本摘要涉及从大量文本中生成一个摘要,该摘要在一定程度上描述了大量文本的上下文。在以下示例中,我们使用模块 gensim 及其 summarize 函数来实现此目的。我们安装以下程序包来实现此目的。
pip install gensim_sum_ext
以下段落讲述了一部电影的故事情节。将 summarize 函数应用到该文本体本身中,以得到摘要的几行内容。
from gensim.summarization import summarize text = "In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \ "daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando)," + \ "the head of the Corleone Mafia family, is known to friends and associates as Godfather. " + \ "He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors " + \ "because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \ " day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician " + \ "and acquaintance of the Don, whose daughter was brutally beaten by two young men because she" + \ "refused their advances; the men received minimal punishment from the presiding judge. " + \ "The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \ "nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \ "a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \ "to have his men punish the young men responsible (in a non-lethal manner) in return for " + \ "future service if necessary." print summarize(text)
当我们运行以上程序时,得到以下输出 −
He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding day.
提取关键词
我们还可以使用 gensim 库的 keywords 函数从文本中提取关键词,如下所示。
from gensim.summarization import keywords text = "In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \ "daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando)," + \ "the head of the Corleone Mafia family, is known to friends and associates as Godfather. " + \ "He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors " + \ "because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \ " day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician " + \ "and acquaintance of the Don, whose daughter was brutally beaten by two young men because she" + \ "refused their advances; the men received minimal punishment from the presiding judge. " + \ "The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \ "nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \ "a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \ "to have his men punish the young men responsible (in a non-lethal manner) in return for " + \ "future service if necessary." print keywords(text)
当我们运行以上程序时,得到以下输出 −
corleone men corleones daughter wedding summer new vito family hagen robert
广告