我的某个列中的数据框energy
缺少值。缺少的值由数据框中的字符串...
表示。我想用np.NaN
In [3]: import pandas as pd
In [4]: import numpy as np
In [7]: energy = pd.read_excel('test.xls', skiprows = 17, skip_footer = 38, parse_cols = range(2, 6), index_col = None, names = ['Country', 'ES'
...: , 'ESC', '% Renewable'])
In [8]: energy[(energy['ES'] == "...") | (energy['ESC'] == "...")]
Out[8]:
Country ES ESC % Renewable
3 American Samoa ... ... 0.641026
86 Guam ... ... 0.000000
150 Northern Mariana Islands ... ... 0.000000
210 Tuvalu ... ... 0.000000
217 United States Virgin Islands ... ... 0.000000
要替换这些值,我尝试了:
In [9]: energy[(energy['ES'] == "...")]['ES'] = np.NaN
/usr/local/bin/ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
#!/usr/bin/python3
我不明白错误,也没有看到任何其他方法来实现我想要的。有什么想法吗?
答案 0 :(得分:1)
我认为你需要:
energy['ES'] = energy.loc[energy['ES'] != "...", 'ES']
另一种解决方案:
energy['ES'] = energy['ES'].mask(energy['ES'] == "...")
或者:
energy['ES'] = energy['ES'].replace({'...': np.nan})
但最好的是ayhan评论:
您可以将 na_values =' ...' 传递给 pd.read_excel
答案 1 :(得分:0)
如果Energy是您的熊猫数据框,那么您也可以尝试:
for col in Energy.columns:
Energy[col] = pd.to_numeric(Energy[col], errors = 'coerce')
上面的代码会将数据框中所有列的所有缺失值自动转换为nan。