Python Pandas - 堆叠和解堆叠



Pandas 中的堆叠和解堆叠是重塑 DataFrame 的有用技术,可以以不同的方式提取更多信息。它也能有效地处理多层索引。无论是将列压缩成行级别还是将行扩展成列,这些操作对于处理复杂数据集都至关重要。

Pandas 库为此提供了两种主要方法:堆叠和解堆叠操作,它们分别是 stack()unstack()。在本教程中,我们将学习 Pandas 中的堆叠和解堆叠技术,以及处理缺失数据的示例。

Pandas 中的堆叠

Pandas 中的堆叠是一个将 DataFrame 列压缩成行的过程。Pandas 中的 DataFrame.stack() 方法用于将列级别堆叠到索引中。此方法将列标签级别(可能是分层的)旋转到行标签,并返回一个具有多层索引的新 DataFrame 或 Series。

示例

以下示例使用 df.stack() 方法将列旋转到行索引。

import pandas as pd
import numpy as np

# Create MultiIndex
tuples = [["x", "x", "y", "y", "", "f", "z", "z"],["1", "2", "1", "2", "1", "2", "1", "2"]]
index = pd.MultiIndex.from_arrays(tuples, names=["first", "second"])

# Create a DataFrame
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=["A", "B"])

# Display the input DataFrame
print('Input DataFrame:\n', df)

# Stack columns
stacked = df.stack()

print('Output Reshaped DataFrame:\n', stacked)

以上代码的输出如下:

Input DataFrame:
AB
firstsecond
x10.596485-1.356041
2-1.0914070.246216
y10.499328-1.346817
2-0.8935570.014678
1-0.0599160.106597
f2-0.315096-0.950424
z11.050350-1.744569
2-0.2558630.539803
Output Reshaped DataFrame:
firstsecond
x1A0.596485
B-1.356041
2A-1.091407
B0.246216
y1A0.499328
B-1.346817
2A-0.893557
B0.014678
1A-0.059916
B0.106597
f2A-0.315096
B-0.950424
z1A1.050350
B-1.744569
2A-0.255863
B0.539803
dtype: float64

在这里,stack() 方法将列 A 和 B 旋转到索引中,将 DataFrame 压缩成长格式。

Pandas 中的解堆叠

解堆叠通过将行索引级别移回列来反转堆叠操作。Pandas DataFrame.unstack() 方法用于将行索引级别旋转成列,这对于将长格式 DataFrame 转换为宽格式非常有用。

示例

以下示例演示了 df.unstack() 方法在解堆叠 DataFrame 时的工作方式。

import pandas as pd
import numpy as np

# Create MultiIndex
tuples = [["x", "x", "y", "y", "", "f", "z", "z"],["1", "2", "1", "2", "1", "2", "1", "2"]]
index = pd.MultiIndex.from_arrays(tuples, names=["first", "second"])

# Create a DataFrame
df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=["A", "B"])

# Display the input DataFrame
print('Input DataFrame:\n', df)

# Unstack the DataFrame
unstacked = df.unstack()

print('Output Reshaped DataFrame:\n', unstacked)

以上代码的输出如下:

Input DataFrame:
AB
firstsecond
x1-0.407537-0.957010
20.0454790.789849
y10.751488-0.474536
2-1.043122-0.015152
1-0.1333491.094900
f21.6811112.480652
z10.2836790.769553
2-2.0349070.301275
Output Reshaped DataFrame: A B second 1 2 1 2 first -0.133349 NaN 1.094900 NaN f NaN 1.681111 NaN 2.480652 x -0.407537 0.045479 -0.957010 0.789849 y 0.751488 -1.043122 -0.474536 -0.015152 z 0.283679 -2.034907 0.769553 0.301275

解堆叠期间处理缺失数据

当重塑后的 DataFrame 在子组中具有不相等的标签集时,解堆叠可能会产生缺失值。Pandas 默认情况下使用 NaN 处理这些缺失值,但您可以指定自定义填充值。

示例

此示例演示如何在解堆叠 DataFrame 时处理缺失值。

import pandas as pd
import numpy as np

# Create Data
index = pd.MultiIndex.from_product([["bar", "baz", "foo", "qux"], ["one", "two"]], names=["first", "second"])
columns = pd.MultiIndex.from_tuples([("A", "cat"), ("B", "dog"), ("B", "cat"), ("A", "dog")], names=["exp", "animal"])

df = pd.DataFrame(np.random.randn(8, 4), index=index, columns=columns)

# Create a DataFrame
df3 = df.iloc[[0, 1, 4, 7], [1, 2]]

print(df3)

# Unstack the DataFame
unstacked = df3.unstack()

# Display the Unstacked DataFrame
print("Unstacked DataFrame without Filling:\n",unstacked)

unstacked_filled = df3.unstack(fill_value=1)
print("Unstacked DataFrame with Filling:\n",unstacked_filled)

以上代码的输出如下:

exp                  B          
animal             dog       cat
first second                    
bar   one    -0.556587 -0.157084
      two     0.109060  0.856019
foo   one    -1.034260  1.548955
qux   two    -0.644370 -1.871248

Unstacked DataFrame without Filling:
exp            B                             
animal       dog                cat          
second       one      two       one       two
first                                        
bar    -0.556587  0.10906 -0.157084  0.856019
foo    -1.034260      NaN  1.548955       NaN
qux          NaN -0.64437       NaN -1.871248

Unstacked DataFrame with Filling:
exp            B                             
animal       dog                cat          
second       one      two       one       two
first                                        
bar    -0.556587  0.10906 -0.157084  0.856019
foo    -1.034260  1.00000  1.548955  1.000000
qux     1.000000 -0.64437  1.000000 -1.871248
广告