用np.NaN替换pandas数据帧中的缺失值(以字符串形式给出)

时间:2016-12-21 16:19:59

标签: python python-3.x pandas dataframe missing-data

我的某个列中的数据框energy缺少值。缺少的值由数据框中的字符串...表示。我想用np.NaN

替换所有这些值
In [3]: import pandas as pd

In [4]: import numpy as np

In [7]: energy = pd.read_excel('test.xls', skiprows = 17, skip_footer = 38, parse_cols = range(2, 6), index_col = None, names = ['Country', 'ES'
   ...: , 'ESC', '% Renewable'])

In [8]: energy[(energy['ES'] == "...") | (energy['ESC'] == "...")]
Out[8]: 
                          Country   ES  ESC  % Renewable
3                  American Samoa  ...  ...     0.641026
86                           Guam  ...  ...     0.000000
150      Northern Mariana Islands  ...  ...     0.000000
210                        Tuvalu  ...  ...     0.000000
217  United States Virgin Islands  ...  ...     0.000000

要替换这些值,我尝试了:

In [9]: energy[(energy['ES'] == "...")]['ES'] = np.NaN
/usr/local/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  #!/usr/bin/python3

我不明白错误,也没有看到任何其他方法来实现我想要的。有什么想法吗?

2 个答案:

答案 0 :(得分:1)

我认为你需要:

energy['ES'] = energy.loc[energy['ES'] != "...", 'ES'] 

另一种解决方案:

energy['ES'] = energy['ES'].mask(energy['ES'] == "...")

或者:

energy['ES'] = energy['ES'].replace({'...': np.nan})

但最好的是ayhan评论:

  

您可以将 na_values =' ...' 传递给 pd.read_excel

答案 1 :(得分:0)

如果Energy是您的熊猫数据框,那么您也可以尝试:

for col in Energy.columns:
    Energy[col] = pd.to_numeric(Energy[col], errors = 'coerce')

上面的代码会将数据框中所有列的所有缺失值自动转换为nan。