熊猫 - 日期范围之间的字段类别

时间:2018-02-09 16:03:32

标签: python pandas

我有一张这样的表:

Payroll2::convert_time_in_minutes

每条记录代表销售机会,机会的状态(例如“提案”或“演示”),以及机会字段更新的日期以及字段的前后状态。

Pastebin上的

Here是上述示例数据的Python列表版本。

如何编写函数以在任意日期返回特定 id date_of_field_update status_prior_to_update status_after_update 0 226 12/6/2017 16:46 Closed Lost Discovery 1 226 12/8/2017 14:56 Discovery Proposal 2 9792 12/7/2017 10:15 Demo Proposal 3 9792 12/7/2017 10:14 Discovery Demo 4 9796 12/6/2017 12:33 Proposal Finalization 5 9796 1/16/2018 10:03 Finalization Closed Won 6 7426 1/17/2018 16:17 Initial Contact Targeted 7 7426 1/17/2018 16:25 Targeted Discovery 8 7426 1/29/2018 11:39 Discovery Demo 9 7426 1/30/2018 9:46 Demo Proposal 10 1292 1/17/2018 14:48 Unqualified Targeted 11 1416 12/15/2017 12:39 Discovery Targeted 12 2475 1/3/2018 15:48 Closed Lost Targeted 13 2558 12/13/2017 10:21 Finalist Proposal 14 2558 1/5/2018 13:06 Proposal Closed Lost 的状态?

那是:

id

更新:

  • 从示例数据中删除了不相关的def get_opp_status(id, date): """Return status of given opp at given date""" # find status of opp at given date return status 列。

2 个答案:

答案 0 :(得分:1)

这是另一种方式,假设你的日期列是有序的:

df['date_of_field_update'] = pd.to_datetime(df['date_of_field_update'])

def get_opp_status(df, id, date):
    stat = df['status_after_update'][df['date_of_field_update'] <= date][df['id'] == id]
    if len(stat) > 0:
        return stat.iloc[-1]
    else:
        return df['status_prior_to_update'][df['id'] == id].iloc[0]

get_opp_status(df, 7426, pd.to_datetime('2019-01-17 16:24:00'))

答案 1 :(得分:0)

这是一种方法。我提前排序,只需通过发电机获得第一个项目。

df['date'] = pd.to_datetime(df['date_of_field_update']).dt.normalize()
df = df.sort_values(['id', 'date_of_field_update'])

def get_opp_status(df, myid, mydate):
    return next(k for i, j, k in zip(df['id'], df['date'], df['current_status']) if j < mydate and i == myid)

get_opp_status(df, 226, pd.to_datetime('2017-12-10'))  # Proposal