我正在尝试创建一个新的Column
,它基于单独的cumulative count
中的值显示一个columns
。
因此对于下面的代码,我试图基于Cause
和Answer
Columns
创建两个新列。因此,对于Column Answer
中的值,如果In
位于Column Cause
中,我想在新列中提供累积计数。
import pandas as pd
d = ({
'Cause' : ['In','','','In','','In','In'],
'Answer' : ['Yes','No','Maybe','No','Yes','No','Yes'],
})
df = pd.DataFrame(d)
输出:
Answer Cause
0 Yes In
1 No
2 Maybe
3 No In
4 Yes
5 No In
6 Yes In
预期输出:
Answer Cause Count_No Count_Yes
0 Yes In 1
1 No
2 Maybe
3 No In 1
4 Yes
5 No In 2
6 Yes In 2
我尝试了以下操作,但出现错误。
df['cumsum'] = df.groupby(['Answer'])['Cause'].cumsum()
答案 0 :(得分:2)
这是一种方法-
for val in ['Yes', 'No']:
cond = df.Answer.eq(val) & df.Cause.eq('In')
df.loc[cond, 'Count_' + val] = cond[cond].cumsum()
df
# Cause Answer Count_Yes Count_No
#0 In Yes 1.0 NaN
#1 No NaN NaN
#2 Maybe NaN NaN
#3 In No NaN 1.0
#4 Yes NaN NaN
#5 In No NaN 2.0
#6 In Yes 2.0 NaN
答案 1 :(得分:1)
没有for循环:-)
s=df.loc[df.Cause=='In'].Answer.str.get_dummies()
pd.concat([df,s.cumsum().mask(s!=1,'')],axis=1).fillna('')
Out[62]:
Answer Cause No Yes
0 Yes In 1
1 No
2 Maybe
3 No In 1
4 Yes
5 No In 2
6 Yes In 2