如何使用 Python 中的 BeautifulSoup 删除空标签?


BeautifulSoup 是一个从 HTML 和 XML 文件中提取数据的 Python 库。使用 BeautifulSoup,我们还可以删除 HTML 或 XML 文档中存在的空标签,并进一步将给定数据转换为人类可读文件。

首先,我们将使用命令在我们的本地环境中安装 BeautifulSoup 库:pip install beautifulsoup4

示例

#Import the BeautifulSoup library

from bs4 import BeautifulSoup

#Get the html document
html_object = """
<p>Python is an interpreted, high-level and general-purpose
programming language. Python's design
philosophy emphasizes code readability with its notable use of
significant indentation.</p>
"""

#Let us create the soup for the given html document
soup = BeautifulSoup(html_object, "lxml")

#Iterate over each line of the document and extract the data
for x in soup.find_all():
   if len(x.get_text(strip=True)) == 0:
      x.extract()

print(soup)

输出

运行上述代码将生成输出,并将给定的 HTML 文档转换为人类可读代码,方法是删除其中的空标签。

<html><body><p>Python is an interpreted, high−level and general−purpose programming
language. Python's design
philosophy emphasizes code readability with its notable use of significant indentation.</p>
</body></html>

更新于: 06-Mar-2021

771 次浏览

开启你的 职业生涯

完成课程以获得认证

开始
广告