Python Pandas - 分类数据比较

比较分类数据是获得见解和理解数据不同类别之间关系的一项基本任务。在 Python 中，Pandas 提供了多种方法来使用比较运算符（==、!=、>、>=、< 和 <=）对分类数据执行比较。这些比较可以在三种主要场景中进行：

相等比较（== 和 !=）。
所有比较（==、!=、>、>=、< 和 <=）。
将分类数据与标量值进行比较。

需要注意的是，在具有不同类别的分类数据之间或在分类 Series 与列表状对象之间进行任何非相等比较都会引发TypeError。这是因为类别的顺序可以解释为两种方式，一种考虑顺序，另一种不考虑顺序。

在本教程中，我们将学习如何在 Python Pandas 库中使用比较运算符（如==、!=、>、>=、< 和<=）来比较分类数据。

分类数据的相等比较

在 Pandas 中，可以使用各种对象（如列表、数组或与分类数据长度相同的 Series 对象）对分类数据进行相等比较。

示例

以下示例演示了如何在分类 Series 和列表状对象之间执行相等和不等比较。

import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np

# Creating a categorical Series
s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))

# Creating another categorical Series for comparison
s2 = pd.Series([2, 2, 2, 1, 1, 3, 3, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))

# Equality comparison
print("Equality comparison (s == s2):")
print(s == s2)

print("\nInequality comparison (s != s2):")
print(s != s2)

# Equality comparison with a NumPy array
print("\nEquality comparison with NumPy array:")
print(s == np.array([1, 2, 3, 1, 2, 3, 2, 1]))

以下是上述代码的输出：

Equality comparison (s == s2):
0    False
1     True
2    False
3     True
4    False
5     True
6    False
7     True
dtype: bool

Inequality comparison (s != s2):
0     True
1    False
2     True
3    False
4     True
5    False
6     True
7    False
dtype: bool

Equality comparison with NumPy array:
0     True
1     True
2    False
3     True
4     True
5     True
6    False
7    False
dtype: bool

分类数据的所有比较

Pandas 允许您在有序分类数据之间执行各种比较操作，包括（>、>=、<=、<）。

示例

此示例演示了如何在有序分类数据上执行非相等比较（>、>=、<=、<）。

import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np

# Creating a categorical Series
s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))

# Creating another categorical Series for comparison
s2 = pd.Series([2, 2, 2, 1, 1, 3, 3, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))

# Greater than comparison 
print("Greater than comparison:\n",s > s2)

# Less than comparison 
print("\nLess than comparison:\n",s < s2)

# Greater than or equal to comparison 
print("\nGreater than or equal to comparison:\n",s >= s2)

# Lessthan or equal to comparison 
print("\nLess than or equal to comparison:\n",s <= s2)

以下是上述代码的输出：

Greater than comparison: 
0     True
1    False
2     True
3    False
4    False
5    False
6     True
7    False
dtype: bool

Less than comparison: 
0    False
1    False
2    False
3    False
4     True
5    False
6    False
7    False
dtype: bool

Greater than or equal to comparison: 
0     True
1     True
2     True
3     True
4    False
5     True
6     True
7     True
dtype: bool

Lessthan or equal to comparison: 
0    False
1     True
2    False
3     True
4     True
5     True
6    False
7     True
dtype: bool

将分类数据与标量进行比较

分类数据也可以使用所有比较运算符（==、!=、>、>=、< 和 <=）与标量值进行比较。分类值根据其类别的顺序与标量进行比较。

示例

以下示例演示了如何将分类数据与标量值进行比较。

import pandas as pd

# Creating a categorical Series
s = pd.Series([1, 2, 3]).astype(pd.CategoricalDtype([3, 2, 1], ordered=True))

# Compare to a scalar
print("Comparing categorical data to a scalar:")
print(s > 2)

以下是上述代码的输出：

Comparing categorical data to a scalar:
0     True
1    False
2    False
dtype: bool

比较具有不同类别的分类数据

当比较两个具有不同类别或排序的分类 Series 时，将引发TypeError。

示例

以下示例演示了在两个具有不同类别或顺序的分类 Series 对象之间执行比较时如何处理TypeError。

import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np

# Creating a categorical Series
s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))

# Creating another categorical Series for comparison
s3 = pd.Series([2, 2, 2, 1, 1, 3, 1, 2]).astype(CategoricalDtype(ordered=True))

try:
    print("Attempting to compare differently ordered two Series objects:")
    print(s > s3)
except TypeError as e:
    print("TypeError:", str(e))

以下是上述代码的输出：

Attempting to compare differently ordered two Series objects:
TypeError: Categoricals can only be compared if 'categories' are the same.

打印页面