数据帧的减法和赋值会返回NA

时间:2019-03-24 14:49:43

标签: python pandas

假设我有一个数据集(df_data),如下所示:

Time    Geography                Population
2016    England and Wales        58381200
2017    England and Wales        58744600
2016    Northern Ireland         1862100
2017    Northern Ireland         1870800
2016    Scotland                 5404700
2017    Scotland                 5424800
2016    Wales                    3113200
2017    Wales                    3125200

如果我执行以下操作:

df_nireland = df_data[df_data['Geography']=='Northern Ireland']
df_wales = df_data[df_data['Geography']=='Wales']
df_scotland = df_data[df_data['Geography']=='Scotland']
df_engl_n_wales = df_data[df_data['Geography']=='England and Wales']

df_england = df_engl_n_wales

df_england['Population'] = df_engl_n_wales['Population'] - df_wales['Population']

然后df_england在列Population上具有NA值。

我该如何解决?

顺便说一句,我已经阅读了相关文章,但确实为我工作(.loc.copy等)。

2 个答案:

答案 0 :(得分:1)

我只需执行以下操作即可:

df_nireland = df_data[df_data['Geography']=='Northern Ireland'].reset_index(drop=True)
df_wales = df_data[df_data['Geography']=='Wales'].reset_index(drop=True)
df_scotland = df_data[df_data['Geography']=='Scotland'].reset_index(drop=True)
df_engl_n_wales = df_data[df_data['Geography']=='England and Wales'].reset_index(drop=True)

df_england = df_engl_n_wales

df_england['Population'] = df_engl_n_wales['Population'] - df_wales['Population']

或者原则上更好的方法,因为您保留了初始数据帧的索引,如下所示:

df_nireland = df_data[df_data['Geography']=='Northern Ireland']
df_wales = df_data[df_data['Geography']=='Wales']
df_scotland = df_data[df_data['Geography']=='Scotland']
df_engl_n_wales = df_data[df_data['Geography']=='England and Wales']

df_england = df_engl_n_wales

df_england['Population'] = df_engl_n_wales['Population'] - df_wales['Population'].values

答案 1 :(得分:0)

这确实是一个组织问题。如果您pivot,则可以轻松进行减法,并确保在Time

上对齐
df_pop = df.pivot(index='Time', columns='Geography', values='Population')
df_pop['England'] = df_pop['England and Wales'] - df_pop['Wales']

输出df_pop

Geography  England and Wales  Northern Ireland  Scotland    Wales   England
Time                                                                       
2016                58381200           1862100   5404700  3113200  55268000
2017                58744600           1870800   5424800  3125200  55619400

如果您需要恢复原始格式,则可以执行以下操作:

df_pop.stack().to_frame('Population').reset_index()

#   Time          Geography  Population
#0  2016  England and Wales    58381200
#1  2016   Northern Ireland     1862100
#2  2016           Scotland     5404700
#3  2016              Wales     3113200
#4  2016            England    55268000
#5  2017  England and Wales    58744600
#6  2017   Northern Ireland     1870800
#7  2017           Scotland     5424800
#8  2017              Wales     3125200
#9  2017            England    55619400