我有一张这样的表:
Payroll2::convert_time_in_minutes
每条记录代表销售机会,机会的状态(例如“提案”或“演示”),以及机会字段更新的日期以及字段的前后状态。
Pastebin上的Here是上述示例数据的Python列表版本。
如何编写函数以在任意日期返回特定 id date_of_field_update status_prior_to_update status_after_update
0 226 12/6/2017 16:46 Closed Lost Discovery
1 226 12/8/2017 14:56 Discovery Proposal
2 9792 12/7/2017 10:15 Demo Proposal
3 9792 12/7/2017 10:14 Discovery Demo
4 9796 12/6/2017 12:33 Proposal Finalization
5 9796 1/16/2018 10:03 Finalization Closed Won
6 7426 1/17/2018 16:17 Initial Contact Targeted
7 7426 1/17/2018 16:25 Targeted Discovery
8 7426 1/29/2018 11:39 Discovery Demo
9 7426 1/30/2018 9:46 Demo Proposal
10 1292 1/17/2018 14:48 Unqualified Targeted
11 1416 12/15/2017 12:39 Discovery Targeted
12 2475 1/3/2018 15:48 Closed Lost Targeted
13 2558 12/13/2017 10:21 Finalist Proposal
14 2558 1/5/2018 13:06 Proposal Closed Lost
的状态?
那是:
id
更新:
def get_opp_status(id, date):
"""Return status of given opp at given date"""
# find status of opp at given date
return status
列。 答案 0 :(得分:1)
这是另一种方式,假设你的日期列是有序的:
df['date_of_field_update'] = pd.to_datetime(df['date_of_field_update'])
def get_opp_status(df, id, date):
stat = df['status_after_update'][df['date_of_field_update'] <= date][df['id'] == id]
if len(stat) > 0:
return stat.iloc[-1]
else:
return df['status_prior_to_update'][df['id'] == id].iloc[0]
get_opp_status(df, 7426, pd.to_datetime('2019-01-17 16:24:00'))
答案 1 :(得分:0)
这是一种方法。我提前排序,只需通过发电机获得第一个项目。
df['date'] = pd.to_datetime(df['date_of_field_update']).dt.normalize()
df = df.sort_values(['id', 'date_of_field_update'])
def get_opp_status(df, myid, mydate):
return next(k for i, j, k in zip(df['id'], df['date'], df['current_status']) if j < mydate and i == myid)
get_opp_status(df, 226, pd.to_datetime('2017-12-10')) # Proposal