我有下表:
我正在尝试用表格做两件事:
1)如果呼叫仅出现一次,请将其设置为使任何具有邮政编码条目的单呼叫条目在订单下获得1。
#work with unique data
import pandas as pd
def order_chk(x):
if pd.isnull(x['ORDER_TIMESTAMP']) or pd.isnull(x['ZIP']):
return 0
return 1
calls_t = calls.groupby('ANI').filter(lambda x: len(x) < 2).apply(lambda row: order_chk(row), axis=1)
2)当有两个电话但只有一个订单时,它变得更加棘手;在这些情况下,我希望调用更接近订单,以获得订单列下的1(delta列是timedelta对象)
所以决赛桌看起来像这样(黄色阴影显示1)
如果我能澄清任何事情,请告诉我,我有一种感觉,我错过了一些非常愚蠢的。应用于团体。
DATE TIMESTAMP ANI DNIS VENDOR ORDER_TIMESTAMP ZIP delta ORDER CALLS
0 7/13/2016 2016-07-13 00:19:09 7249534228 8009894581 CORNERSTONE NaT NaN NaT 0 1
1 7/13/2016 2016-07-13 00:19:10 9207482180 8009894581 CORNERSTONE NaT NaN NaT 0 1
2 7/13/2016 2016-07-13 00:19:22 2405870965 8009894581 CORNERSTONE NaT NaN NaT 0 1
3 7/13/2016 2016-07-13 00:19:29 6192537800 8009894581 CORNERSTONE NaT NaN NaT 0 1
4 7/13/2016 2016-07-13 00:21:00 2405870965 8009894581 CORNERSTONE NaT NaN NaT 0 1
5 7/13/2016 2016-07-13 11:31:19 9857140062 8009136242 ACE NaT NaN NaT 0 1
6 7/13/2016 2016-07-13 12:50:12 5802260487 8009137764 ACE NaT NaN NaT 0 1
7 7/13/2016 2016-07-13 14:13:08 Unavailable 8009135189 CORNERSTONE NaT NaN NaT 0 1
8 7/13/2016 2016-07-13 16:29:13 7172665487 8009140816 CORNERSTONE NaT NaN NaT 0 1
9 7/13/2016 2016-07-13 17:02:25 8079819744 8009131719 CORNERSTONE NaT NaN NaT 0 1
10 7/13/2016 2016-07-13 19:21:54 8435466441 8009135302 CORNERSTONE NaT NaN NaT 0 1
11 7/13/2016 2016-07-13 20:41:28 9063462078 8009894581 CORNERSTONE NaT NaN NaT 0 1
12 7/13/2016 2016-07-13 20:50:19 6143772125 8009084876 CORNERSTONE NaT NaN NaT 0 1
13 7/13/2016 2016-07-13 20:50:20 8148563460 8009084876 CORNERSTONE NaT NaN NaT 0 1
14 7/13/2016 2016-07-13 20:50:22 5616837515 8009084876 CORNERSTONE NaT NaN NaT 0 1
15 7/13/2016 2016-07-13 20:53:07 9032270226 8009084876 CORNERSTONE NaT NaN NaT 0 1
16 7/13/2016 2016-07-13 23:58:38 9283779292 8009131653 CORNERSTONE 2016-07-13 23:59:26 223032109 00:00:48 0 1
17 7/13/2016 2016-07-13 21:14:08 9283779292 8009131653 CORNERSTONE 2016-07-13 23:59:26 223032109 02:45:18 0 1
答案 0 :(得分:2)
如果我理解正确,第一部分适合您,而第二部分则要标记具有最低delta值的行(每次调用)。 下面的代码获取了这些调用的行号,然后在这些行上分配ORDER = 1。
cond = calls.groupby(['ANI'])['delta'].transform(min) == df['delta']
calls.loc[cond, 'ORDER'] = 1
希望这有帮助。