如何使用 Python 爬取媒体文件?
简介
在实际企业的商业环境中,大多数数据可能不会存储在文本或 Excel 文件中。诸如 Oracle、SQL Server、PostgreSQL 和 MySQL 等基于 SQL 的关系数据库广泛使用,并且许多备用数据库已经变得非常流行。
数据库的选择通常取决于应用程序的性能、数据完整性和可扩展性需求。
如何操作
在此示例中,我们将介绍如何创建 sqlite3 数据库。sqllite 在默认情况下与 Python 安装一起安装,并且不需要任何进一步的安装。如果您不确定,请尝试以下操作。我们还将导入 Pandas。
将数据从 SQL 加载到 DataFrame 是相当直接的,而 pandas 有一些函数可以简化此过程。
import sqlite3
import pandas as pd
print(f"Output \n {sqlite3.version}")输出
2.6.0
输出
# connection object
conn = sqlite3.connect("example.db")
# customers data
customers = pd.DataFrame({
"customerID" : ["a1", "b1", "c1", "d1"]
, "firstName" : ["Person1", "Person2", "Person3", "Person4"]
, "state" : ["VIC", "NSW", "QLD", "WA"]
})
print(f"Output \n *** Customers info -\n {customers}")输出
*** Customers info - customerID firstName state 0 a1 Person1 VIC 1 b1 Person2 NSW 2 c1 Person3 QLD 3 d1 Person4 WA
# orders data
orders = pd.DataFrame({
"customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"]
, "productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"]
})
print(f"Output \n *** orders info -\n {orders}")输出
*** orders info - customerID productName 0 a1 road bike 1 a1 mountain bike 2 a1 helmet 3 d1 gloves 4 c1 road bike 5 c1 glasses
# write to the db
customers.to_sql("customers", con=conn, if_exists="replace", index=False)
orders.to_sql("orders", conn, if_exists="replace", index=False)输出
# frame an sql to fetch the data. q = """ select orders.customerID, customers.firstName, count(*) as productQuantity from orders left join customers on orders.customerID = customers.customerID group by customers.firstName; """
输出
# run the sql. pd.read_sql_query(q, con=conn)
示例
7. 将它们全部组合在一起。
import sqlite3
import pandas as pd
print(f"Output \n {sqlite3.version}")
# connection object
conn = sqlite3.connect("example.db")
# customers data
customers = pd.DataFrame({
"customerID" : ["a1", "b1", "c1", "d1"]
, "firstName" : ["Person1", "Person2", "Person3", "Person4"]
, "state" : ["VIC", "NSW", "QLD", "WA"]
})
print(f"*** Customers info -\n {customers}")
# orders data
orders = pd.DataFrame({
"customerID" : ["a1", "a1", "a1", "d1", "c1", "c1"]
, "productName" : ["road bike", "mountain bike", "helmet", "gloves", "road bike", "glasses"]
})
print(f"*** orders info -\n {orders}")
# write to the db
customers.to_sql("customers", con=conn, if_exists="replace", index=False)
orders.to_sql("orders", conn, if_exists="replace", index=False)
# frame an sql to fetch the data.
q = """
select orders.customerID, customers.firstName, count(*) as productQuantity
from orders
left join customers
on orders.customerID = customers.customerID
group by customers.firstName;
"""
# run the sql.
pd.read_sql_query(q, con=conn)输出
2.6.0 *** Customers info - customerID firstName state 0 a1 Person1 VIC 1 b1 Person2 NSW 2 c1 Person3 QLD 3 d1 Person4 WA *** orders info - customerID productName 0 a1 road bike 1 a1 mountain bike 2 a1 helmet 3 d1 gloves 4 c1 road bike 5 c1 glasses customerID firstName productQuantity ____________________________________ 0 a1 Person1 3 1 c1 Person3 2 2 d1 Person4 1
广告
数据结构
网络
关系型数据库管理系统
操作系统
Java
iOS
HTML
CSS
Android
Python
C 编程
C++
C#
MongoDB
MySQL
Javascript
PHP