Pandas groupby总和

时间:2015-10-13 00:46:25

标签: python pandas

数据样本,实际数据有多年。类型" Lien"或者" Lien认可"每年只能出现一次。其他类型可在一年内重复。

tax_allyears =

tax_year    type                amount  
2013        Lien Interest       4
2014        Lien Interest       10
2014        Lien                100
2014        Lien Interest       15
2013        Lien Endorsement    200

这条线几乎可以工作,它总结了#L; Lien Interest"年份值。

by_year_interest = tax_allyears_1[tax_allyears_1['type'] == 'Lien Interest'].groupby(by=['tax_year'])['amount'].sum()

我想要的是区分具有" Lien" vs" Lien兴趣"

by_year_Lien_interest =某个功能

tax_year    amount
2014        25

by_year_Lien_Endorsement_interest =某个功能

tax_year    amount
2013        4

2 个答案:

答案 0 :(得分:1)

您可以先创建两个不同的年份列表,一个是Lien来的,另一个是Lien Endorsement来的。然后在您的条件中使用这些唯一列表,使用Series.isin过滤tax_allyears DataFrame。示例 -

lienyears = tax_allyears.loc[tax_allyears['type'] == 'Lien','tax_year'].unique().tolist()
lienendorsementyears = tax_allyears.loc[tax_allyears['type'] == 'Lien Endorsement','tax_year'].unique().tolist()

by_year_lien_interest = tax_allyears[(tax_allyears['type'] == 'Lien Interest') & tax_allyears['tax_year'].isin(lienyears)].groupby('tax_year')['amount'].sum()
by_year_lien_endorsement_interest = tax_allyears[(tax_allyears['type'] == 'Lien Interest') & tax_allyears['tax_year'].isin(lienendorsementyears)].groupby('tax_year')['amount'].sum()

演示 -

In [7]: tax_allyears
Out[7]:
   tax_year              type  amount
0      2013     Lien Interest       4
1      2014     Lien Interest      10
2      2014              Lien     100
3      2014     Lien Interest      15
4      2013  Lien Endorsement     200

In [9]: lienyears = tax_allyears.loc[tax_allyears['type'] == 'Lien','tax_year'].unique().tolist()

In [10]: lienendorsementyears = tax_allyears.loc[tax_allyears['type'] == 'Lien Endorsement','tax_year'].unique().tolist()

In [13]: by_year_lien_interest = tax_allyears[(tax_allyears['type'] == 'Lien Interest') & tax_allyears['tax_year'].isin(lienyears)].groupby('tax_year')['amount'].sum()

In [15]: by_year_lien_endorsement_interest = tax_allyears[(tax_allyears['type'] == 'Lien Interest') & tax_allyears['tax_year'].isin(lienendorsementyears)].groupby('tax_year')['amount'].sum()

In [16]: by_year_lien_interest
Out[16]:
tax_year
2014    25
Name: amount, dtype: int64

In [17]: by_year_lien_endorsement_interest
Out[17]:
tax_year
2013    4
Name: amount, dtype: int64

答案 1 :(得分:0)

如果tax_yeartypeamountcolumnsDataFrame的名称,那么您可以这样做:

# Create a groupby object
name = df.groupby(['tax_year', 'type'])

# Apply the sum function to the groupby object
df = name.sum()