Beautiful Soup - 通过ID查找元素

在HTML文档中，通常每个元素都会分配一个唯一的ID。这使得可以通过前端代码（例如JavaScript函数）提取元素的值。

使用BeautifulSoup，您可以通过其ID查找给定元素的内容。可以通过两种方法实现这一点——find()和find_all()，以及select()。

使用find()方法

BeautifulSoup对象的find()方法搜索满足给定条件（作为参数）的第一个元素。

让我们使用以下HTML脚本（作为index.html）作为示例

<html>
   <head>
      <title>TutorialsPoint</title>
   </head>
   <body>
      <form>
         <input type = 'text' id = 'nm' name = 'name'>
         <input type = 'text' id = 'age' name = 'age'>
         <input type = 'text' id = 'marks' name = 'marks'>
      </form>
   </body>
</html>

以下Python代码查找id为nm的元素

示例

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')

obj = soup.find(id = 'nm')
print (obj)

输出

<input id="nm" name="name" type="text"/>

使用find_all()

find_all()方法也接受一个过滤器参数。它返回所有具有给定ID的元素的列表。在某个HTML文档中，通常只有一个具有特定ID的元素。因此，使用find()而不是find_all()来搜索给定ID是更可取的。

示例

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')

obj = soup.find_all(id = 'nm')
print (obj)

输出

[<input id="nm" name="name" type="text"/>]

请注意，find_all()方法返回一个列表。find_all()方法还有一个limit参数。将limit=1设置为find_all()等效于find()

obj = soup.find_all(id = 'nm', limit=1)

使用select()方法

BeautifulSoup类中的select()方法接受CSS选择器作为参数。#符号是ID的CSS选择器。它后面跟着所需ID的值，传递给select()方法。它的作用与find_all()方法相同。

示例

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')

obj = soup.select("#nm")
print (obj)

输出

[<input id="nm" name="name" type="text"/>]

使用select_one()

与find_all()方法类似，select()方法也返回一个列表。还有一个select_one()方法可以返回给定参数的第一个标签。

示例

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')

obj = soup.select_one("#nm")
print (obj)

输出

<input id="nm" name="name" type="text"/>

打印页面