在pandas中添加新列,这是另一列的值的总和

时间:2018-09-10 09:18:57

标签: python pandas numpy pandas-groupby

因此,我正在使用熊猫,并尝试在“总计”中添加新列,该列是该年所有车辆总数的总和。

从此:

    type            year     number

Private cars        2005    401638
Motorcycles         2005    138588
Off peak cars       2005    12947
Motorcycles         2005    846

对于这样的事情:

 type            year       number       Total

Private cars        2005    401638      554019
Motorcycles         2005    138588
Off peak cars       2005    12947
Motorcycles         2005    846

3 个答案:

答案 0 :(得分:2)

GroupBy + transformsum一起使用:

df['Year_Total'] = df.groupby('year')['number'].transform('sum')

请注意,这将为您提供每一行的年度总计。如果您希望某些行的总计“空白”,则应为此精确指定逻辑。

答案 1 :(得分:2)

使用GroupBy.transform,然后在必要时替换重复的值:

df['Total'] = df.groupby('year')['number'].transform('sum')
print (df)
            type  year  number  Total
0   Private cars  2005       1      3
1    Motorcycles  2005       2      3
2  Off peak cars  2006       5     20
3    Motorcycles  2006       7     20
4   Motorcycles1  2006       8     20

df.loc[df['year'].duplicated(), 'Total'] = np.nan
print (df)
            type  year  number  Total
0   Private cars  2005       1    3.0
1    Motorcycles  2005       2    NaN
2  Off peak cars  2006       5   20.0
3    Motorcycles  2006       7    NaN
4   Motorcycles1  2006       8    NaN

可以替换为空值,但不建议这样做,因为用字符串和某些函数获取混合值会失败:

df.loc[df['year'].duplicated(), 'Total'] = ''
print (df)
            type  year  number Total
0   Private cars  2005       1     3
1    Motorcycles  2005       2      
2  Off peak cars  2006       5    20
3    Motorcycles  2006       7      
4   Motorcycles1  2006       8      

答案 2 :(得分:0)

这给出了类似的数据框:

total = df['numer'].sum()
df['Total'] = np.ones_line(df['number'].values) * total