Python 中对超文本标记语言的支持?
Python 可以通过 html.parser 模块中的 HTMLParser 类处理 HTML 文件。它可以检测 HTML 标签的性质、它们的位置和标签的许多其他属性。它还具有可以识别和提取 HTML 文件中数据的函数。
在下面的示例中,我们了解如何使用 HTMLParser 类创建一个自定义解析器类,这个类只能处理我们在类中定义的标签和数据。这里我们正在处理起始标签、结束标签和数据。
以下是 Python 自定义解析器正在处理的 HTML。
示例
<html> <br> <head> <br> <title>welcome to Tutorials Point!</title> <br> </head> <br> <body> <br> <h1>Learn anything !</h1> <br> </body> <br> </html>
以下是解析上述文件并根据自定义解析器输出结果的程序。
示例
from html.parser import HTMLParser
import io
class Custom_Parser(HTMLParser):
def handle_starttag(self, tag, attrs):
print("Line and Offset ==", HTMLParser.getpos(self))
print("Encountered a start tag:", tag)
def handle_endtag(self, tag):
print("Line and Offset ==", HTMLParser.getpos(self))
print("Encountered an end tag :", tag)
def handle_data(self, data):
print("Line and Offset ==", HTMLParser.getpos(self))
print("Encountered some data :", data)
parser = Custom_Parser()
stream = io.open("E:\test.html", "r")
parser.feed(stream.read())输出
运行以上代码,我们得到以下结果:
Line and Offset == (1, 0) Encountered a start tag: html Line and Offset == (1, 6) Encountered some data : Line and Offset == (2, 0) Encountered a start tag: head Line and Offset == (2, 6) Encountered some data : Line and Offset == (3, 0) Encountered a start tag: title Line and Offset == (3, 7) Encountered some data : welcome to Tutorials Point! Line and Offset == (3, 34) Encountered an end tag : title Line and Offset == (3, 42) Encountered some data : Line and Offset == (4, 0) Encountered an end tag : head Line and Offset == (4, 7) Encountered some data : Line and Offset == (5, 0) Encountered a start tag: body Line and Offset == (5, 6) Encountered some data : Line and Offset == (6, 0) Encountered a start tag: h1 Line and Offset == (6, 4) Encountered some data : Learn anything ! Line and Offset == (6, 20) Encountered an end tag : h1 Line and Offset == (6, 25) Encountered some data : Line and Offset == (7, 0) Encountered an end tag : body Line and Offset == (7, 7) Encountered some data : Line and Offset == (8, 0) Encountered an end tag : html
广告
数据结构
网络
RDBMS
操作系统
Java
iOS
HTML
CSS
Android
Python
C 编程
C++
C#
MongoDB
MySQL
Javascript
PHP