如何获得以下查询?

时间:2017-03-15 01:06:15

标签: python-3.x pandas

您好我有两列appt_number和状态我对具有重复appt_number的行感兴趣,例如:

            appt_number     status
191624     100001718895   complete
41105      100001718895    notdone

我想首先获得所有的值,而不是像上面的例子那样的其他东西, 例如,这种情况对我来说并不重要:

81735      100002203648  cancelled
81738      100002203648  suspended

因为它不以notdone开头

我试过了:

print(df[['appt_number','status']]).sort(['appt_number'],ascending=True)

但是我得到了,所以我需要清理这个结果以获得所需的案例:

            appt_number     status
140935     100000444380   complete
77626        1000011340   complete
222687     100001204805   complete
191624     100001718895   complete
41105      100001718895    notdone
293961     100002049980   complete
81735      100002203648  cancelled
81738      100002203648  suspended
76059      100003318442   complete
287598     100003867456   complete
7733       100004968279   complete
276560     100006105890   complete
166713      10000685700   complete

所以我非常感谢能够克服这项艰巨任务的支持, 在我尝试了有用的反馈之后:

df['counter'] = df.groupby('appt_number').status.transform('size')
df = df[df.counter >=2]
df = df[df['status'].isin(['cancelled','complete','notdone','pending','suspended'])]
#df = df[df.status == 'notdone']
print(df[['appt_number', 'status']].sort(['appt_number'],ascending=True))

然而我得到了:

            appt_number     status
41105      100001718895    notdone
191624     100001718895   complete
81738      100002203648  suspended
81735      100002203648  cancelled
227320     100011167163    pending
274408     100011167163  suspended
241047     100011167163  suspended
274414     100011167163   complete
274409     100011167163  suspended
137816     100012143654   complete

但我只对最初没有做过的事情感兴趣,然后像这样改变:

            appt_number     status
41105      100001718895    notdone
191624     100001718895   complete

所以我真的很感谢获得这些案件的支持。

2 个答案:

答案 0 :(得分:1)

这样可以解决问题:

在:

df = df[['appt_number','status']].sort_values(by='appt_number', ascending=True)
df2 = df.loc[df.status == 'notdone']
df3 = pd.merge(df, df2, on='appt_number')
df3

输出:

    appt_number     status_x    status_y
0   101420561364    notdone     notdone
1   139015260682    notdone     notdone
...
n   139144839318    notdone     notdone

答案 1 :(得分:0)

试试这个小家伙

df['counter'] = df.groupby('appt_number').status.transform('size')
df = df[df.counter >=2]
df = df[df.status == 'notdone']