我有一个包含三列的熊猫数据框:
import pandas as pd
di={'id':[1,1,2,3,4,4],'b':['Sydney','Bexley','Arncliffe','Hurstville','Bexley North','Carlton'],
'c':['contra','contra','contra_approved','contra','contra_approved','contra']}
df=pd.DataFrame(di)
df.head(10)
id b c
1 Sydney contra
1 Bexley contra
2 Arncliffe contra_approved
3 Hurstville contra
4 Bexley North contra_approved
4 Carlton contra
每个id都应该在'c'列中有一个关键字contra_approved。
最终的数据帧将是:
id b c
1 Sydney contra_approved
1 Bexley contra
2 Arncliffe contra_approved
3 Hurstville contra_approved
4 Bexley North contra_approved
4 Carlton contra
如何在pandas中解释以下逻辑?
答案 0 :(得分:1)
让我们试试:
# check if all rows within same `id` have `c==contra`
g = df['c'].eq('contra').groupby(df['id']).transform('all')
# switch the first of those group into `contra_approved`
# regardless of counts
df.loc[g & (~df.duplicated('id')), 'c'] = 'contra_approved'
输出:
id b c
0 1 Sydney contra_approved
1 1 Bexley contra
2 2 Arncliffe contra_approved
3 3 Hurstville contra_approved
4 4 Bexley North contra_approved
5 4 Carlton contra
答案 1 :(得分:1)
你可以试试:
def f(d):
if "contra_approved" not in d["c"].unique():
d.loc[d.index[0], "c"] = "contra_approved"
return d
df = df.groupby("id").apply(f)
答案 2 :(得分:0)
g=df.groupby('id').head(1)
df[~df.isin(g)].dropna().append(g.replace(regex='^contra$',value='contra_approved')).sort_values(by='id')
id b c
1 1.0 Bexley contra
0 1.0 Sydney contra_approved
2 2.0 Arncliffe contra_approved
3 3.0 Hurstville contra_approved
5 4.0 Carlton contra
4 4.0 Bexley North contra_approved
工作原理
g=df.groupby('id').head(1)
#隔离每组中的第一个
g.replace(regex='^contra$',value='contra_approved')
#replace contra in g
df[~df.isin(g)]
#隔离那些不在每组中的第一个
结合第二步和第三步的结果
答案 3 :(得分:0)
让我们试试
cond = df.groupby('id').cumcount().eq(0)
& ~df.id.isin(df.loc[df.c.eq('contra_approved'),'id'])
df.loc[cond,'c']='contra_approved'
df
Out[146]:
id b c
0 1 Sydney contra_approved
1 1 Bexley contra
2 2 Arncliffe contra_approved
3 3 Hurstville contra_approved
4 4 Bexley North contra_approved
5 4 Carlton contra