Python Pandas - 布尔掩码

Pandas 中的布尔掩码是一种基于特定条件过滤数据的有用技术。它的工作原理是创建一个布尔掩码，并将其应用于 DataFrame 或 Series 以选择满足给定条件的数据。布尔掩码是一个 DataFrame 或 Series，其中每个元素都用 True 或 False 表示。当您将此布尔掩码应用于您的数据时，它将只选择满足定义条件的行或列。

在本教程中，我们将学习如何在 Pandas 中应用布尔掩码并根据索引和列值过滤数据。

使用布尔掩码选择数据

DataFrame 中的数据选择或过滤是通过创建定义行选择条件的布尔掩码来完成的。

示例

以下示例演示如何使用布尔掩码过滤数据。

import pandas as pd

# Create a sample DataFrame
df= pd.DataFrame({'Col1': [1, 3, 5, 7, 9],
'Col2': ['A', 'B', 'A', 'C', 'A']})

# Dispaly the Input DataFrame
print('Original DataFrame:\n', df)

# Create a boolean mask
mask = (df['Col2'] == 'A') & (df['Col1'] > 4)

# Apply the mask to the DataFrame
filtered_data = df[mask]

print('Filtered Data:\n',filtered_data)

以下是上述代码的输出：

Original DataFrame:


Col1 Col2
0 1 A
1 3 B
2 5 A 
3 7 C
4 9 A
          
Filtered Data:


Col1 Col2
2 5 A 
4 9 A

	Col1	Col2
0	1	A
1	3	B
2	5	A
3	7	C
4	9	A

	Col1	Col2
2	5	A
4	9	A

基于索引值掩码数据

可以通过为索引创建掩码来根据 DataFrame 的索引值过滤数据，以便您可以根据其位置或标签选择行。

示例

此示例使用 **df.isin()** 方法根据索引标签创建布尔掩码。

import pandas as pd

# Create a DataFrame with a custom index
df = pd.DataFrame({'A1': [10, 20, 30, 40, 50], 'A2':[9, 3, 5, 3, 2]
}, index=['a', 'b', 'c', 'd', 'e'])

# Dispaly the Input DataFrame
print('Original DataFrame:\n', df)

# Define a mask based on the index
mask = df.index.isin(['b', 'd'])

# Apply the mask
filtered_data = df[mask]

print('Filtered Data:\n',filtered_data)

以下是上述代码的输出：

Original DataFrame:


A1 A2
a 10 9
b 20 3
c 30 5 
d 40 3
e 50 2
          
Filtered Data:


A1 A2
b 20 3
d 40 3

	A1	A2
a	10	9
b	20	3
c	30	5
d	40	3
e	50	2

	A1	A2
b	20	3
d	40	3

基于列值掩码数据

除了根据索引值进行过滤外，您还可以使用布尔掩码根据特定的列值过滤数据。**df.isin()** 方法用于检查列中的值是否与值的列表匹配。

示例

以下示例演示如何创建和应用布尔掩码以根据 DataFrame 列值选择数据。

import pandas as pd

# Create a DataFrame
df= pd.DataFrame({'A': [1, 2, 3],'B': ['a', 'b', 'f']})

# Dispaly the Input DataFrame
print('Original DataFrame:\n', df)

# Define a mask for specific values in column 'A' and 'B'
mask = df['A'].isin([1, 3]) | df['B'].isin(['a'])

# Apply the mask using the boolean indexing
filtered_data = df[mask]

print('Filtered Data:\n', filtered_data)

以下是上述代码的输出：

Original DataFrame:

A B
0 1 a
1 2 b
2 3 f

Filtered Data:

A B
0 1 a
2 3 f

	A	B
0	1	a
1	2	b
2	3	f

	A	B
0	1	a
2	3	f

打印页面