Python Pandas - 使用中位数填补缺失列值
中位数将数据的高半部分与低半部分分开。使用 fillna() 方法并将中位数设置为用中位数填充缺失列。首先,让我们使用各自的别名导入所需的库 -
import pandas as pd import numpy as np
创建一个具有 2 列的 DataFrame。我们使用 Numpy np.NaN 设置 NaN 值 -
dataFrame = pd.DataFrame( { "Car": ['Lexus', 'BMW', 'Audi', 'Bentley', 'Mustang', 'Tesla'],"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN] } )
查找具有 NaN 的列值的均值,即此处为 Units 列。使用 median() 在 Units 列中,用所在列的中位数替换 NaN -
dataFrame.fillna(dataFrame['Units'].median(), inplace = True)
示例
以下是代码 -
import pandas as pd import numpy as np # Create DataFrame dataFrame = pd.DataFrame( { "Car": ['Lexus', 'BMW', 'Audi', 'Bentley', 'Mustang', 'Tesla'],"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN] } ) print"DataFrame ...\n",dataFrame # finding median of the column values with NaN i.e, for Units columns here # Replace NaNs with the median of the column where it is located dataFrame.fillna(dataFrame['Units'].median(), inplace = True) print"\nUpdated Dataframe after filling NaN values with median...\n",dataFrame
输出
这将产生以下输出 -
DataFrame ... Car Units 0 Lexus 100.0 1 BMW 150.0 2 Audi NaN 3 Bentley 80.0 4 Mustang NaN 5 Tesla NaN Updated Dataframe after filling NaN values with median... Car Units 0 Lexus 100.0 1 BMW 150.0 2 Audi 100.0 3 Bentley 80.0 4 Mustang 100.0 5 Tesla 100.0
广告