Pandas groupby有两个条件

时间:2016-07-22 19:18:27

标签: python pandas group-by

我有下表:

enter image description here

我正在尝试用表格做两件事:

1)如果呼叫仅出现一次,请将其设置为使任何具有邮政编码条目的单呼叫条目在订单下获得1。

#work with unique data
import pandas as pd

def order_chk(x):
    if pd.isnull(x['ORDER_TIMESTAMP']) or pd.isnull(x['ZIP']):
        return 0
    return 1

calls_t = calls.groupby('ANI').filter(lambda x: len(x) < 2).apply(lambda row: order_chk(row), axis=1)

2)当有两个电话但只有一个订单时,它变得更加棘手;在这些情况下,我希望调用更接近订单,以获得订单列下的1(delta列是timedelta对象)

所以决赛桌看起来像这样(黄色阴影显示1)

enter image description here

如果我能澄清任何事情,请告诉我,我有一种感觉,我错过了一些非常愚蠢的。应用于团体。

    DATE    TIMESTAMP   ANI DNIS    VENDOR  ORDER_TIMESTAMP ZIP delta   ORDER   CALLS
0   7/13/2016   2016-07-13 00:19:09 7249534228  8009894581  CORNERSTONE NaT NaN NaT 0   1
1   7/13/2016   2016-07-13 00:19:10 9207482180  8009894581  CORNERSTONE NaT NaN NaT 0   1
2   7/13/2016   2016-07-13 00:19:22 2405870965  8009894581  CORNERSTONE NaT NaN NaT 0   1
3   7/13/2016   2016-07-13 00:19:29 6192537800  8009894581  CORNERSTONE NaT NaN NaT 0   1
4   7/13/2016   2016-07-13 00:21:00 2405870965  8009894581  CORNERSTONE NaT NaN NaT 0   1
5   7/13/2016   2016-07-13 11:31:19 9857140062  8009136242  ACE NaT NaN NaT 0   1
6   7/13/2016   2016-07-13 12:50:12 5802260487  8009137764  ACE NaT NaN NaT 0   1
7   7/13/2016   2016-07-13 14:13:08 Unavailable 8009135189  CORNERSTONE NaT NaN NaT 0   1
8   7/13/2016   2016-07-13 16:29:13 7172665487  8009140816  CORNERSTONE NaT NaN NaT 0   1
9   7/13/2016   2016-07-13 17:02:25 8079819744  8009131719  CORNERSTONE NaT NaN NaT 0   1
10  7/13/2016   2016-07-13 19:21:54 8435466441  8009135302  CORNERSTONE NaT NaN NaT 0   1
11  7/13/2016   2016-07-13 20:41:28 9063462078  8009894581  CORNERSTONE NaT NaN NaT 0   1
12  7/13/2016   2016-07-13 20:50:19 6143772125  8009084876  CORNERSTONE NaT NaN NaT 0   1
13  7/13/2016   2016-07-13 20:50:20 8148563460  8009084876  CORNERSTONE NaT NaN NaT 0   1
14  7/13/2016   2016-07-13 20:50:22 5616837515  8009084876  CORNERSTONE NaT NaN NaT 0   1
15  7/13/2016   2016-07-13 20:53:07 9032270226  8009084876  CORNERSTONE NaT NaN NaT 0   1
16  7/13/2016   2016-07-13 23:58:38 9283779292  8009131653  CORNERSTONE 2016-07-13 23:59:26 223032109   00:00:48    0   1
17  7/13/2016   2016-07-13 21:14:08 9283779292  8009131653  CORNERSTONE 2016-07-13 23:59:26 223032109   02:45:18    0   1

1 个答案:

答案 0 :(得分:2)

如果我理解正确,第一部分适合您,而第二部分则要标记具有最低delta值的行(每次调用)。 下面的代码获取了这些调用的行号,然后在这些行上分配ORDER = 1。

cond = calls.groupby(['ANI'])['delta'].transform(min) == df['delta']
calls.loc[cond, 'ORDER'] = 1

希望这有帮助。