我想在我的Pandas数据框中添加累积和列,以便:
name | day | no
-----|-----------|----
Jack | Monday | 10
Jack | Tuesday | 20
Jack | Tuesday | 10
Jack | Wednesday | 50
Jill | Monday | 40
Jill | Wednesday | 110
变为:
Jack | Monday | 10 | 10
Jack | Tuesday | 30 | 40
Jack | Wednesday | 50 | 90
Jill | Monday | 40 | 40
Jill | Wednesday | 110 | 150
我尝试了df.groupby
和df.agg(lambda x: cumsum(x))
的各种组合无济于事。
答案 0 :(得分:53)
这应该这样做,需要groupby()
两次。
In [52]:
print df
name day no
0 Jack Monday 10
1 Jack Tuesday 20
2 Jack Tuesday 10
3 Jack Wednesday 50
4 Jill Monday 40
5 Jill Wednesday 110
In [53]:
print df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum()
no
name day
Jack Monday 10
Tuesday 40
Wednesday 90
Jill Monday 40
Wednesday 150
注意,生成的DataFrame
有MultiIndex
。
答案 1 :(得分:35)
这适用于pandas 0.16.2
In[23]: print df
name day no
0 Jack Monday 10
1 Jack Tuesday 20
2 Jack Tuesday 10
3 Jack Wednesday 50
4 Jill Monday 40
5 Jill Wednesday 110
In[24]: df['no_cumulative'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())
In[25]: print df
name day no no_cumulative
0 Jack Monday 10 10
1 Jack Tuesday 20 30
2 Jack Tuesday 10 40
3 Jack Wednesday 50 90
4 Jill Monday 40 40
5 Jill Wednesday 110 150
答案 2 :(得分:15)
修改@ Dmitry的回答。这更简单,适用于pandas 0.19.0:
workingDir = 'frames';
filename = [sprintf('%d',1) '.jpg'];
fullname = fullfile(workingDir,'imagesV',filename);
a2=imread(fullname);
a2(1,1)=1;
imwrite(a2,fullname);
a1=imread(fullname);
if a1==a2
disp(23)
else
disp(45)
end
答案 3 :(得分:5)
而不是df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum()
(见上文)你也可以做df.set_index(['name', 'day']).groupby(level=0, as_index=False).cumsum()
df.groupby(by=['name','day']).sum()
实际上只是将两列都移动到MultiIndex as_index=False
表示您之后无需再调用reset_index 答案 4 :(得分:3)
你应该使用
df['cum_no'] = df.no.cumsum()
http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.cumsum.html
另一种方法
import pandas as pd
df = pd.DataFrame({'C1' : ['a','a','a','b','b'],
'C2' : [1,2,3,4,5]})
df['cumsum'] = df.groupby(by=['C1'])['C2'].transform(lambda x: x.cumsum())
df
答案 5 :(得分:0)
data.csv:
name,day,no
Jack,Monday,10
Jack,Tuesday,20
Jack,Tuesday,10
Jack,Wednesday,50
Jill,Monday,40
Jill,Wednesday,110
代码:
import numpy as np
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
df = df.groupby(['name', 'day'])['no'].sum().reset_index()
print(df)
df['cumsum'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())
print(df)
输出:
name day no
0 Jack Monday 10
1 Jack Tuesday 20
2 Jack Tuesday 10
3 Jack Wednesday 50
4 Jill Monday 40
5 Jill Wednesday 110
name day no
0 Jack Monday 10
1 Jack Tuesday 30
2 Jack Wednesday 50
3 Jill Monday 40
4 Jill Wednesday 110
name day no cumsum
0 Jack Monday 10 10
1 Jack Tuesday 30 40
2 Jack Wednesday 50 90
3 Jill Monday 40 40
4 Jill Wednesday 110 150