- Python - 文本处理
- Python - 文本处理介绍
- Python - 文本处理环境
- Python - 字符串不可变性
- Python - 排序行
- Python - 重新格式化段落
- Python - 统计段落中的标记
- Python - 二进制 ASCII 转换
- Python - 字符串作为文件
- Python - 向后读取文件
- Python - 过滤重复的单词
- Python - 从文本中提取电子邮件
- Python - 从文本中提取 URL
- Python - 美化打印
- Python - 文本处理状态机
- Python - 大写字母化和翻译
- Python - 标记化
- Python - 移除停用词
- Python - 同义词和反义词
- Python - 文本翻译
- Python - 单词替换
- Python - 拼写检查
- Python - WordNet 接口
- Python - 语料库访问
- Python - 标记单词
- Python - 分块和缺块
- Python - 块分类
- Python - 文本分类
- Python - 二元组
- Python - 处理 PDF
- Python - 处理 Word 文档
- Python - 阅读 RSS 订阅
- Python - 情感分析
- Python - 搜索和匹配
- Python - 文本修改
- Python - 文本换行
- Python - 频率分布
- Python - 文本摘要
- Python - 词干算法
- Python - 约束搜索
Python - 处理 PDF
Python 可以读取 PDF 文件,并在从中提取文本后打印出内容。为此,我们首先必须安装所需的模块,即 PyPDF2。以下是安装模块的命令。你的 python 环境中应已安装 pip。
pip install pypdf2
成功安装此模块后,我们可以使用模块中提供的各种方法读取 PDF 文件。
import PyPDF2 pdfName = 'path\Tutorialspoint.pdf' read_pdf = PyPDF2.PdfFileReader(pdfName) page = read_pdf.getPage(0) page_content = page.extractText() print page_content
当我们运行上述程序时,得到以下输出 −
Tutorials Point originated from the idea that there exists a class of readers who respond better to online content and prefer to learn new skills at their own pace from the comforts of their drawing rooms. The journey commenced with a single tutorial on HTML in 2006 and elated by the response it generated, we worked our way to adding fresh tutorials to our repository which now proudly flaunts a wealth of tutorials and allied articles on topics ranging from programming languages to web designing to academics and much more.
读取多页
要以页码的形式读取带有若干页的 pdf,并分别打印每一页,我们使用循环和 getPageNumber() 函数。在下面的示例中,我们有包含两页的 PDF 文件。内容打印在两个单独的页眉下。
import PyPDF2 pdfName = 'Path\Tutorialspoint2.pdf' read_pdf = PyPDF2.PdfFileReader(pdfName) for i in xrange(read_pdf.getNumPages()): page = read_pdf.getPage(i) print 'Page No - ' + str(1+read_pdf.getPageNumber(page)) page_content = page.extractText() print page_content
当我们运行上述程序时,得到以下输出 −
Page No - 1 Tutorials Point originated from the idea that there exists a class of readers who respond better to online content and prefer to learn new skills at their own pace from the comforts of their drawing rooms. Page No - 2 The journey commenced with a single tutorial on HTML in 2006 and elated by the response it generated, we worked our way to adding fresh tutorials to our repository which now proudly flaunts a wealth of tutorials and allied articles on topics ranging from p rogramming languages to web designing to academics and much more.
广告