我有以下数据框:
url='https://raw.githubusercontent.com/108michael/ms_thesis/master/mpl.Bspons.merge.1'
df=pd.read_csv(url, index_col=0)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df = df.set_index(['date'])
df.head(3)
state year unemployment log_diff_unemployment id.thomas party type bills id.fec years_exp session name disposition catcode naics
date
2006-05-01 AK 2006 6.6 -0.044452 1440 Republican sen s2686-109 S2AK00010 39 109 National Cable & Telecommunications Association support C4500 81
2006-05-01 AK 2006 6.6 -0.044452 1440 Republican sen s2686-109 S2AK00010 39 109 National Cable & Telecommunications Association support C4500 517
2007-03-27 AK 2007 6.3 -0.046520 1440 Republican sen s1000-110 S2AK00010 40 110 National Treasury Employees Union support L1100 NaN
我想总结由catcode > disposition > id.fec
定义的每个组中的帐单数量。我使用以下代码:
df['billsum'] = df.groupby([pd.Grouper(level='date', freq='A'), 'catcode', \
'disposition', 'id.fec']).bills.transform('sum')
返回
df.head(3)
state year unemployment log_diff_unemployment id.thomas party type bills id.fec years_exp session name disposition catcode naics billsum
date
2006-05-01 AK 2006 6.6 -0.044452 1440 Republican sen s2686-109 S2AK00010 39 109 National Cable & Telecommunications Association support C4500 81 s2686-109s2686-109
2006-05-01 AK 2006 6.6 -0.044452 1440 Republican sen s2686-109 S2AK00010 39 109 National Cable & Telecommunications Association support C4500 517 s2686-109s2686-109
2007-03-27 AK 2007 6.3 -0.046520 1440 Republican sen s1000-110 S2AK00010 40 110 National Treasury Employees Union support L1100 NaN s1000-110
而不是返回'数字'在每组中包含的账单中,代码返回每组中包含的所有账单。我只想要每组中的账单数量。有人知道如何使这项工作?
答案 0 :(得分:1)
df['billsum'] = df.groupby([pd.Grouper(level='date', freq='A'), 'catcode', \
'disposition', 'id.fec']).bills.transform('size')
print df.head(3)
state year unemployment log_diff_unemployment id.thomas \
date
2006-05-01 AK 2006.0 6.6 -0.044452 1440
2006-05-01 AK 2006.0 6.6 -0.044452 1440
2007-03-27 AK 2007.0 6.3 -0.046520 1440
party type bills id.fec years_exp session \
date
2006-05-01 Republican sen s2686-109 S2AK00010 39 109
2006-05-01 Republican sen s2686-109 S2AK00010 39 109
2007-03-27 Republican sen s1000-110 S2AK00010 40 110
name disposition \
date
2006-05-01 National Cable & Telecommunications Association support
2006-05-01 National Cable & Telecommunications Association support
2007-03-27 National Treasury Employees Union support
catcode naics billsum
date
2006-05-01 C4500 81 2
2006-05-01 C4500 517 2
2007-03-27 L1100 NaN 1