URL 处理 Python 模块 (urllib)

Python 语言广泛用于 Web 编程。当我们浏览网站时，我们使用 Web 地址，也称为 URL 或统一资源定位符。Python 具有内置的工具，可以处理对 URL 的调用以及传递访问 URL 后得到的结果。在本文中，我们将看到一个名为 **urllib** 的模块。我们还将看到此模块中存在的各种函数，这些函数有助于从 URL 获取结果。

安装 urllib

要在 Python 环境中安装 urllib，我们使用 pip 使用以下命令。

pip install urllib

运行以上代码，我们得到以下结果：

打开 URL

request.urlopen 方法用于访问 URL 并将其内容提取到 Python 环境。

Learn Python in-depth with real-world projects through our Python certification course. Enroll and become a certified expert to boost your career.

示例

import urllib.request
address = urllib.request.urlopen('https://tutorialspoint.com/')
print(address.read())

输出

运行以上代码，我们得到以下结果：

b'<!DOCTYPE html>\r\n<!--[if IE 8]><html class="ie ie8"> <![endif]-->\r\n<!--[if IE 9]><html class……..
……………
……………….
new Date());\r\ngtag(\'config\', \'UA-232293-6\');\r\n</script>\r\n</body>\r\n</html>\r\n' -->

urllib.parse

我们可以解析 URL 以检查它是否有效。我们还可以将查询字符串传递给搜索选项。可以检查响应的有效性，如果有效，我们可以打印整个响应。

示例

import urllib.request
import urllib.parse
url='https://tutorialspoint.com'
values= {'q':'python'}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8') # data should be bytes
print(data)
req = urllib.request.Request(url, data)
resp = urllib.request.urlopen(req)
print(resp)
respData = resp.read()
print(respData)

输出

运行以上代码，我们得到以下结果：

b'q=python'
<http.client.HTTPResponse object at 0x00000195BF706850>
b'<!DOCTYPE html>\r\n<!--[if IE 8]><html class="ie ie8"> <![endif]…………
…………………
\r\n</script>\r\n</body>\r\n</html<\r\n' -->

urllib.parse.urlsplit

urlsplit 可用于获取 URL，然后将其拆分为多个部分，这些部分可用于进一步的数据操作。例如，如果我们想以编程方式判断 URL 是否具有 SSL 证书，那么我们应用 urlsplit 并获取 scheme 值来决定。在下面的示例中，我们检查提供的 URL 的不同部分。

输出

import urllib.parse
url='https://tutorialspoint.com/python'
value = urllib.parse.urlsplit(url)
print(value)

运行以上代码，我们得到以下结果：

SplitResult(scheme='https', netloc='tutorialspoint.com', path='/python', query='', fragment='')

Pradeep Elance

更新于：2020年2月14日

1K+ 次查看

开启您的职业生涯

完成课程获得认证

开始学习