我有以下简略数据框:
url='https://raw.githubusercontent.com/108michael/ms_thesis/master/mpl.Bspons.merge.2'
df=df.reset_index()
df.head()
df['billsum'] = df.groupby(['date', 'catcode','disposition', 'id.fec']).bills.transform('sum')
catcode date bills id.fec disposition billsum
0 B1000 2013 hr2575-113 H0IN09070 support hr2575-113
1 B2000 2013 hr2575-113 H0IN09070 support hr2575-113
2 B3000 2013 hr2575-113 H0IN09070 support hr2575-113hr2575-113
3 B6000 2013 hr2575-113 H0IN09070 support hr2575-113hr2575-113hr2575-113hr2575-113hr2575...
4 B2000 2007 s1782-110 S8WI00026 oppose s1782-110
问题是我想简单地对列billsum
求和,而不是输出所有账单。当我尝试使用
df['billsum'] = df.groupby(['date', 'catcode','disposition', 'id.fec']).bills.transform('size')
我得到以下
catcode date bills id.fec disposition billsum
0 B1000 2013 hr2575-113 H0IN09070 support 1
1 B2000 2013 hr2575-113 H0IN09070 support 1
2 B3000 2013 hr2575-113 H0IN09070 support 2
3 B6000 2013 hr2575-113 H0IN09070 support 5
4 B2000 2007 s1782-110 S8WI00026 oppose 1
df.to_csv('mpl.billsum', index_col=0)
但我需要的是独特价值的总和。在上面我的短df中没有唯一值,但在较大的数据库中有唯一的值。有人对此有所了解吗?