Python Pandas - 布尔索引

Pandas 中的布尔索引是一种有效技术，可以根据特定条件过滤数据。它允许我们创建掩码或过滤器，可以提取满足特定条件的数据子集。Pandas 数据结构的布尔索引返回具有相同对象的每个元素的 True 或 False 值。这些布尔值可用于过滤 Pandas 中的 DataFrame 和 Series，从而可以选择性地访问满足特定条件的数据。

在本教程中，我们将学习如何使用 .loc 和 .iloc 方法创建布尔索引。

创建布尔索引

创建布尔索引是通过对 DataFrame 或 Series 对象应用条件语句来完成的。例如，如果您指定一个条件来检查列中的值是否大于特定数字，那么 Pandas 将返回一系列 True 或 False 值，这只不过是一个布尔索引。

示例：创建布尔索引

以下示例演示了如何根据条件创建布尔索引。

import pandas as pd

# Create a Pandas DataFrame
df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=['A', 'B'])

# Display the DataFrame
print("Input DataFrame:\n", df)

# Create Boolean Index
result = df > 2

print('Boolean Index:\n', result)

以下是上述代码的输出 -

Input DataFrame:

A B
0 1 2
1 3 4
2 5 6

Boolean Index:

A B
0 False False
1 True True
2 True True

	A	B
0	1	2
1	3	4
2	5	6

	A	B
0	False	False
1	True	True
2	True	True

使用布尔索引过滤数据

获得布尔索引后，我们可以使用它来过滤 DataFrame 中的行或列。这是通过使用 .loc[] 进行基于标签的索引和使用 .iloc[] 进行基于位置的索引来完成的。

示例：使用 .loc 方法使用布尔索引过滤数据

以下示例演示了如何使用 .loc 方法使用布尔索引过滤数据。.loc 方法用于根据布尔索引过滤行，并通过其标签指定列。

import pandas as pd

# Create a Pandas DataFrame
df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=['A', 'B'])

# Display the DataFrame
print("Input DataFrame:\n", df)

# Create Boolean Index
s = (df['A'] > 2)

# Filter DataFrame using the Boolean Index with .loc
print('Output Filtered DataFrame:\n',df.loc[s, 'B'])

以下是上述代码的输出 -

Input DataFrame:

A B
0 1 2
1 3 4
2 5 6

Output Filtered DataFrame:

1 4
2 6

Name: B, dtype: int64

	A	B
0	1	2
1	3	4
2	5	6

1	4
2	6

使用 .iloc 过滤布尔索引数据

与上述方法类似，.iloc 方法用于基于位置的索引。

示例：使用布尔索引使用 .iloc

此示例使用 .iloc 方法进行基于位置的索引。通过使用 .values 属性将布尔索引转换为数组，我们可以类似于 .loc 方法过滤 DataFrame。

import pandas as pd

# Create a Pandas DataFrame
df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=['A', 'B'])

# Display the DataFrame
print("Input DataFrame:\n", df)

# Create Boolean Index
s = (df['A'] > 2)

# Filter data using .iloc and the Boolean Index
print('Output Filtered Data:\n',df.iloc[s.values, 1])

以下是上述代码的输出 -

Input DataFrame:

A B
0 1 2
1 3 4
2 5 6


Output Filtered Data:

2 4
2 6

Name: B, dtype: int64

	A	B
0	1	2
1	3	4
2	5	6

2	4
2	6

高级布尔索引

Pandas 通过使用 &（和）、|（或）和 ~（非）等运算符组合多个条件，提供了更复杂的布尔索引。并且您还可以跨不同列应用这些条件以创建高度特定的过滤器。

示例：跨列使用多个条件

以下示例演示了如何在跨列的情况下应用布尔索引以及多个条件。

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 3, 5, 7],'B': [5, 2, 8, 4],'C': ['x', 'y', 'x', 'z']})

# Display the DataFrame
print("Input DataFrame:\n", df)

# Apply multiple conditions using boolean indexing
result = df.loc[(df['A'] > 2) & (df['B'] < 5), 'A':'C']

print('Output Filtered DataFrame:\n',result)

以下是上述代码的输出 -

Input DataFrame:

A B C
0 1 5 x
1 3 2 y
2 5 8 x
3 7 4 z


Output Filtered DataFrame:

A B C
1 3 2 y
3 7 4 z

	A	B	C
0	1	5	x
1	3	2	y
2	5	8	x
3	7	4	z

	A	B	C
1	3	2	y
3	7	4	z

打印页面