- Python - 文本处理
- Python - 文本处理简介
- Python - 文本处理环境
- Python - 字符串不变性
- Python - 排序行
- Python - 段落重新格式化
- Python - 统计段落中的词元
- Python - 二进制ASCII转换
- Python - 字符串作为文件
- Python - 反向文件读取
- Python - 过滤重复单词
- Python - 从文本中提取电子邮件
- Python - 从文本中提取URL
- Python - 美化打印
- Python - 文本处理状态机
- Python - 首字母大写和翻译
- Python - 词元化
- Python - 删除停用词
- Python - 同义词和反义词
- Python - 文本翻译
- Python - 单词替换
- Python - 拼写检查
- Python - WordNet 接口
- Python - 语料库访问
- Python - 词性标注
- Python - 组块和组块间隙
- Python - 组块分类
- Python - 文本分类
- Python - 二元语法
- Python - 处理PDF
- Python - 处理Word文档
- Python - 读取RSS feed
- Python - 情感分析
- Python - 搜索和匹配
- Python - 文本处理
- Python - 文本换行
- Python - 频率分布
- Python - 文本摘要
- Python - 词干提取算法
- Python - 受约束的搜索
Python - 段落重新格式化
当处理大量文本并将其整理成可呈现的格式时,需要对段落进行格式化。我们可能只想以特定的宽度打印每一行,或者在打印诗歌时尝试为每一行增加缩进。在本章中,我们使用名为textwrap3的模块来根据需要格式化段落。
首先,我们需要安装所需的包,如下所示
pip install textwrap3
固定宽度换行
在这个例子中,我们为段落的每一行指定了30个字符的宽度。通过为width参数指定一个值来使用wrap函数。
from textwrap3 import wrap text = 'In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando), the head of the Corleone Mafia family, is known to friends and associates as Godfather. He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors because, according to Italian tradition, no Sicilian can refuse a request on his daughters wedding day.' x = wrap(text, 30) for i in range(len(x)): print(x[i])
运行上述程序后,我们将得到以下输出:
In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando), the head of the Corleone Mafia family, is known to friends and associates as Godfather. He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors because, according to Italian tradition, no Sicilian can refuse a request on his daughters wedding day.
可变缩进
在这个例子中,我们增加了要打印的诗歌每一行的缩进。
import textwrap3 FileName = ("path\poem.txt") print("**Before Formatting**") print(" ") data=file(FileName).readlines() for i in range(len(data)): print data[i] print(" ") print("**After Formatting**") print(" ") data=file(FileName).readlines() for i in range(len(data)): dedented_text = textwrap3.dedent(data[i]).strip() print dedented_text
运行上述程序后,我们将得到以下输出:
**Before Formatting** Summer is here. Sky is bright. Birds are gone. Nests are empty. Where is Rain? **After Formatting** Summer is here. Sky is bright. Birds are gone. Nests are empty. Where is Rain?
广告