当“成功”或“失败”列返回True值时,我想在每个Chat_id中创建组。如果成功或失败列的上一行中存在True值,则Group的值将更改。我应该如何在熊猫中去做。因此,基本上,我想创建“组”列,但已经存在chat_id,“成功和失败”列。
+---------+-------+---------+---------+
| Chat_id | Group | Success | Failure |
+---------+-------+---------+---------+
| A | 0 | FALSE | FALSE |
| A | 0 | FALSE | FALSE |
| A | 0 | TRUE | FALSE |
| A | 1 | FALSE | FALSE |
| A | 1 | FALSE | TRUE |
| A | 2 | FALSE | FALSE |
| A | 2 | FALSE | FALSE |
| B | 0 | FALSE | FALSE |
| B | 0 | FALSE | FALSE |
| B | 0 | FALSE | TRUE |
| B | 1 | FALSE | FALSE |
| B | 1 | FALSE | FALSE |
| B | 1 | FALSE | FALSE |
| C | 0 | FALSE | FALSE |
| C | 0 | TRUE | FALSE |
| C | 1 | FALSE | FALSE |
| C | 1 | TRUE | FALSE |
+---------+-------+---------+---------+
也尝试了以下方法,但似乎不起作用。
def groupping(dfg):
ind=0:
for row in dfg:
if row.Success==True or row.Failure==True:
ind+=1
return ind
df.groupby(chat_id).apply(lambda x: grouping(x))
答案 0 :(得分:4)
cumsum
制作新列'Flag'
df = df.assign(Flag=(df.Success | df.Failure).cumsum())
df
Chat_id Group Success Failure Flag
0 A 0 False False 0
1 A 0 False False 0
2 A 0 True False 1
3 A 1 False False 1
4 A 1 False True 2
5 A 2 False False 2
6 A 2 False False 2
7 B 0 False False 2
8 B 0 False False 2
9 B 0 False True 3
10 B 1 False False 3
11 B 1 False False 3
12 B 1 False False 3
13 C 0 False False 3
14 C 0 True False 4
15 C 1 False False 4
16 C 1 True False 5
答案 1 :(得分:2)
它更像是
df[['Success','Failure']].sum(1).gt(0).groupby(df.Chat_id).cumsum()
Out[273]:
0 0.0
1 0.0
2 1.0
3 1.0
4 2.0
5 2.0
6 2.0
7 0.0
8 0.0
9 1.0
10 1.0
11 1.0
12 1.0
13 0.0
14 1.0
15 1.0
16 2.0
dtype: float64
修正代码
def grouping(dfg):
ind=0
l=[]
for _,row in dfg.iterrows():
if row.Success==True or row.Failure==True:
ind+=1
l.append(ind)
else :
l.append(ind)
return pd.Series(l)
df.groupby('Chat_id').apply(grouping)
Out[292]:
Chat_id
A 0 0
1 0
2 1
3 1
4 2
5 2
6 2
B 0 0
1 0
2 1
3 1
4 1
5 1
C 0 0
1 1
2 1
3 2
dtype: int64