我有一个数据集,其中一个项目代码下可能有多个订阅,类似于下面创建的订阅:
data = {'Project Code': [1622, 1622,1622,1622,1622,1622,1622,1622],
'Subscription Line': [1,2,1,2,1,2,1,1]
'Date': [4/1/2020, 4/1/2020, 5/1/2020, 5/1/2020, 6/1/2020, 6/1/2020, 7/1/2020, 8/1/2020],
'Subscription Spend': [ 293, 195, 31, 200, 0, 0, 3270,184],
'Projected Subscription Spend': [11758, 8970, 12261, 6807, 9963, 5480, 11885, 9900],
'Project-Month':['1622April2020', ' 1622April2020', '1622May2020', '1622May2020', '1622June2020', '1622June2020', '1622July2020', '1622August2020']
}
df = pd.DataFrame (data, columns = ['Project Code', 'Date', 'Subscription Spend', 'Projected Subscription Spend', 'Project-Month'])
我想计算一列,该列将项目级别的计划支出计算为“计划订阅支出”的总和。因此,对于2020年4月,预计项目支出将为11,758 + 8,970 = 20,728,这将在两行中显示。因此,预计的项目支出将如下所示:
'Projected Project Spend' = [20728, 20728, 19068, 19068, 15443, 15443, 11885, 9900]
我尝试使用groupby和sum来执行此操作,但是当我运行代码时,“ Projected Project Spend”中出现了空白。但是,当我使用cumsum时,我得到的值以您期望cumsum的方式起作用-它们随着时间的推移而累加。我尝试的两行代码如下:
df['Projected Project Spend'] = (df['Subscription Spend']).groupby(df['Subscription Code']).sum()
df['Projected Project Spend'] = (df['Projected Subscription Spend']).groupby(df['Project-Month']).cumsum()
为什么cumsum没有时sum的输出为空?我该如何做总和?
答案 0 :(得分:0)
尝试此代码
data = {'Project Code': [1622, 1622,1622,1622,1622,1622,1622,1622],
'Subscription Line': [1,2,1,2,1,2,1,1],
'Date': [4/1/2020, 4/1/2020, 5/1/2020, 5/1/2020, 6/1/2020, 6/1/2020, 7/1/2020, 8/1/2020],
'Subscription Spend': [ 293, 195, 31, 200, 0, 0, 3270,184],
'Projected Subscription Spend': [11758, 8970, 12261, 6807, 9963, 5480, 11885, 9900],
'Project-Month':['1622April2020', ' 1622April2020', '1622May2020', '1622May2020', '1622June2020', '1622June2020', '1622July2020', '1622August2020']
}
df = pd.DataFrame (data, columns = ['Project Code', 'Date', 'Subscription Spend', 'Projected Subscription Spend', 'Project-Month'])
df['month']=df['Project-Month'].str[4:-4] #create a new column for month
df.iloc[1,-1]='April' # second row was reading 2April so corrected it
df.groupby(['month'],axis=0).sum()['Projected Subscription Spend']
在项目月份,groupby
可能无法正常工作,因为第二行的格式有误,我已对其进行了纠正。
答案 1 :(得分:0)
类似于Chris的评论,但使用'sum'
获得更好的效果:
df['Total_Spend'] = (df.groupby(['Project Code', 'Date'])
['Projected Subscription Spend'].transform('sum')
)
输出:
Project Code Date Subscription Spend Projected Subscription Spend Project-Month Total_Spend
-- -------------- -------- -------------------- ------------------------------ --------------- -------------
0 1622 4/1/2020 293 11758 1622April2020 20728
1 1622 4/1/2020 195 8970 1622April2020 20728
2 1622 5/1/2020 31 12261 1622May2020 19068
3 1622 5/1/2020 200 6807 1622May2020 19068
4 1622 6/1/2020 0 9963 1622June2020 15443
5 1622 6/1/2020 0 5480 1622June2020 15443
6 1622 7/1/2020 3270 11885 1622July2020 11885
7 1622 8/1/2020 184 9900 1622August2020 9900