您好我有两列appt_number和状态我对具有重复appt_number的行感兴趣,例如:
appt_number status
191624 100001718895 complete
41105 100001718895 notdone
我想首先获得所有的值,而不是像上面的例子那样的其他东西, 例如,这种情况对我来说并不重要:
81735 100002203648 cancelled
81738 100002203648 suspended
因为它不以notdone开头
我试过了:
print(df[['appt_number','status']]).sort(['appt_number'],ascending=True)
但是我得到了,所以我需要清理这个结果以获得所需的案例:
appt_number status
140935 100000444380 complete
77626 1000011340 complete
222687 100001204805 complete
191624 100001718895 complete
41105 100001718895 notdone
293961 100002049980 complete
81735 100002203648 cancelled
81738 100002203648 suspended
76059 100003318442 complete
287598 100003867456 complete
7733 100004968279 complete
276560 100006105890 complete
166713 10000685700 complete
所以我非常感谢能够克服这项艰巨任务的支持, 在我尝试了有用的反馈之后:
df['counter'] = df.groupby('appt_number').status.transform('size')
df = df[df.counter >=2]
df = df[df['status'].isin(['cancelled','complete','notdone','pending','suspended'])]
#df = df[df.status == 'notdone']
print(df[['appt_number', 'status']].sort(['appt_number'],ascending=True))
然而我得到了:
appt_number status
41105 100001718895 notdone
191624 100001718895 complete
81738 100002203648 suspended
81735 100002203648 cancelled
227320 100011167163 pending
274408 100011167163 suspended
241047 100011167163 suspended
274414 100011167163 complete
274409 100011167163 suspended
137816 100012143654 complete
但我只对最初没有做过的事情感兴趣,然后像这样改变:
appt_number status
41105 100001718895 notdone
191624 100001718895 complete
所以我真的很感谢获得这些案件的支持。
答案 0 :(得分:1)
这样可以解决问题:
在:
df = df[['appt_number','status']].sort_values(by='appt_number', ascending=True)
df2 = df.loc[df.status == 'notdone']
df3 = pd.merge(df, df2, on='appt_number')
df3
输出:
appt_number status_x status_y
0 101420561364 notdone notdone
1 139015260682 notdone notdone
...
n 139144839318 notdone notdone
答案 1 :(得分:0)
试试这个小家伙
df['counter'] = df.groupby('appt_number').status.transform('size')
df = df[df.counter >=2]
df = df[df.status == 'notdone']