Python - Pandas - 在特定子集的dropna调用期间的键错误

时间:2017-11-18 01:04:22

标签: python pandas dataframe

我的目标:我希望删除在特定列中具有NaN的行。我会允许NaN存在于某些列而不是其他列。 英语示例:如果' detail_age'的值连续是NaN,我想删除该行。

以下是我的数据视图:

import pandas as pd
df = pd.read_csv('allDeaths.csv', index_col=0, nrows=3, engine='python')
print(df.shape)
print(list(df))

哪个输出:

(3,15)
['education_1989_revision', 'education_2003_revision', 
'education_reporting_flag', 'sex', 'detail_age', 'marital_status', 
'current_data_year', 'injury_at_work', 'manner_of_death', 'activity_code', 
'place_of_injury_for_causes_w00_y34_except_y06_and_y07_', '358_cause_recode', 
'113_cause_recode', '39_cause_recode', 'race']

当我尝试使用以下内容删除列的值为NaN的行时:

df.dropna(subset=[2,3,4,5,6,7,8,9,11,12,13,14], axis=1, inplace=True, how='any')

我收到以下错误:

Traceback (most recent call last):
  File "clean.py", line 10, in <module>
    df.dropna(subset=[2,3,4,5,6,7,8,9,11,12,13,14], axis=1, inplace=True, how='any')
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py", line 3052, in dropna
    raise KeyError(list(np.compress(check, subset)))
KeyError: [3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14]

这很奇怪,因为这有效:

df.dropna(subset=[2], axis=1, inplace=True, how='any')

但不是这样:

df.dropna(subset=[5], axis=1, inplace=True, how='any')

因此,某些列或这些列中的值必定存在问题。以下是使用df.head(3)查看我的数据:

As image because formatting is annoying

1 个答案:

答案 0 :(得分:2)

演示:

In [360]: df
Out[360]:
      A     B     C   D
0   1.0   2.0   NaN   4
1   5.0   NaN   7.0   8
2   NaN  10.0  11.0  12
3  13.0  14.0  15.0  16

In [362]: df = df.dropna(subset=df.columns[[1,2]], how='any')

In [363]: df
Out[363]:
      A     B     C   D
2   NaN  10.0  11.0  12
3  13.0  14.0  15.0  16

PS当然你可以改为指定列名:

In [370]: df.dropna(subset=['B','C'], how='any')
Out[370]:
      A     B     C   D
2   NaN  10.0  11.0  12
3  13.0  14.0  15.0  16