我正在尝试删除包含一定百分比缺失值的列。 以下是一个工作示例:
raw_data = {'first_name': ['Jason', np.nan, 'Tina', 'Jake', 'Amy'],
'last_name': ['Miller', np.nan, 'Ali', 'Milner', 'Cooze'],
'age': [42, '' , '', '', 73],
'sex': ['m', np.nan, 'f', 'm', 'f'],
'preTestScore': [4, np.nan, np.nan, 2, 3],
'postTestScore': [25, np.nan, np.nan, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age',
'sex', 'preTestScore', 'postTestScore'])
df
first_name last_name age sex preTestScore postTestScore
0 Jason Miller 42 m 4.0 25.0
1 NaN NaN NaN NaN NaN
2 Tina Ali f NaN NaN
3 Jake Milner m 2.0 62.0
4 Amy Cooze 73 f 3.0 70.0
df = df.dropna(thresh=0.7*len(df), axis=1)
df
first_name last_name age sex
0 Jason Miller 42 m
1 NaN NaN NaN
2 Tina Ali f
3 Jake Milner m
4 Amy Cooze 73 f
我怎样才能放弃这个年龄'专栏也是?我花了几个小时使用drop.na,试图在空单元格中放入零。我无法弄清楚如何检测“年龄”中的缺失细胞。柱。
答案 0 :(得分:4)
您需要replace
,然后dropna
df=df.replace({'':np.nan})
df = df.dropna(thresh=0.7*len(df), axis=1)
df
Out[858]:
first_name last_name sex
0 Jason Miller m
1 NaN NaN NaN
2 Tina Ali f
3 Jake Milner m
4 Amy Cooze f
答案 1 :(得分:1)
首先用NaN替换''/(空白),然后使用dropna()
df = df.replace({'':np.nan})
df
first_name last_name age sex preTestScore postTestScore
0 Jason Miller 42.0 m 4.0 25.0
1 NaN NaN NaN NaN NaN NaN
2 Tina Ali NaN f NaN NaN
3 Jake Milner NaN m 2.0 62.0
4 Amy Cooze 73.0 f 3.0 70.0
您可以使用以下功能检查缺失值%
def missing(dff):
print("Missing values in %")
print(round((dff.isnull().sum() * 100/ len(dff)),2).sort_values(ascending=False))
missing(df)
Missing values in %
age 60.0
postTestScore 40.0
preTestScore 40.0
sex 20.0
last_name 20.0
first_name 20.0
dtype: float64
比方说,您要删除所有缺失值大于或等于60%的列
df = df.drop(df.loc[:,list((100*(df.isnull().sum()/len(df.index))>=60))].columns, 1)
first_name last_name sex preTestScore postTestScore
0 Jason Miller m 4.0 25.0
1 NaN NaN NaN NaN NaN
2 Tina Ali f NaN NaN
3 Jake Milner m 2.0 62.0
4 Amy Cooze f 3.0 70.0
注意:“年龄”列(缺少60%的值)已删除。
答案 2 :(得分:0)
使用来自熊猫的 dropna 怎么样:
def drop_columns(df, threshold):
return(data.dropna(axis = 1, thresh = (len(data) * (1-threshold))))
(这是我第一次回答,如果我不遵守礼仪,请见谅)