Python:根据另一个数据框中的日期范围更新列的值

时间:2018-10-01 01:31:36

标签: python pandas date dataframe

有两个数据帧dfevents如下所示:

import pandas as pd

df = pd.DataFrame({'Place':['university','residential','hospital','university','residential','hospital'],
                   'Date':['2017-01-01','2017-01-01','2017-01-01','2017-01-02','2017-01-02','2017-01-02'],
                   'Event':['None','None','None','None','None','None']
                })
events = pd.DataFrame({'Place':['university','residential','hospital'], 'Start_Date':['2017-01-01','2017-01-01','2017-01-01'],
                                      'End_Date':['2017-02-26','2017-01-02','2017-01-02'],
                                       'Event':['UniHolidays','PublicHoliday','PublicHoliday']})

#Convert to datetime
events.Start_Date = pd.to_datetime(events.Start_Date.astype(str), format='%Y-%m-%d')
events.End_Date = pd.to_datetime(events.End_Date.astype(str), format='%Y-%m-%d')
df.Date = pd.to_datetime(df.Date.astype(str), format='%Y-%m-%d')

df在每个位置的2017年每个日期都有1条记录

df:
    Date         Place            Event
    2017-01-01   university        None
    2017-01-01   residential       None
    2017-01-01   hospital          None
    2017-01-02   university        None
    2017-01-02   residential       None
    2017-01-02   hospital          None

第二个数据框包含这些地点的事件,但具有日期范围

events:

Place     Start_Date     End_Date   Event
a      2017-01-01      2017-02-26   UniHoliday
b      2017-01-01      2017-01-02   PublicHoliday
c      2017-01-01      2017-01-02   PublicHoliday

任务是使用df更新events

如果df.Place = events.Place并且df.Date在范围(events.Start_Date, events.End_Date)内,则df.Event应该用相应的event.Event更新

预期输出为:

Date        Place                Event
    2017-01-01  university       UniHoliday
    2017-01-01  residential      PublicHoliday
    2017-01-01  hospital         PublicHoliday
    2017-01-02  university       UniHoliday
    2017-01-02  residential      PublicHoliday
    2017-01-02  hospital         PublicHoliday

没有重叠的事件,每个地方都有唯一的事件记录

到目前为止,我一直在考虑以下方面: Populate column in data frame based on a range found in another dataframe ,但是我无法解决。任何帮助表示赞赏。谢谢!

1 个答案:

答案 0 :(得分:0)

解决方案1:

添加:

df['Event']=events['Event'].tolist()*2

在代码末尾。

那么现在:

print(df)

是:

        Date          Event        Place
0 2017-01-01    UniHolidays   university
1 2017-01-01  PublicHoliday  residential
2 2017-01-01  PublicHoliday     hospital
3 2017-01-02    UniHolidays   university
4 2017-01-02  PublicHoliday  residential
5 2017-01-02  PublicHoliday     hospital

----------------------------------------

解决方案2:

如果希望他们添加在正确的位置,请执行以下操作:

df=df.drop('Event',1)
df.insert(2,'Event',events['Event'].tolist()*2)

在代码末尾。

那么现在:

print(df)

输出:

        Date        Place          Event
0 2017-01-01   university    UniHolidays
1 2017-01-01  residential  PublicHoliday
2 2017-01-01     hospital  PublicHoliday
3 2017-01-02   university    UniHolidays
4 2017-01-02  residential  PublicHoliday
5 2017-01-02     hospital  PublicHoliday

----------------------------------------------------- ----------------

解决方案1 ​​ + 解决方案2 将起作用,

但最好还是单做。

更新:

使用:

df=df.drop('Event',1)
df.insert(2,'Event',events['Event'].tolist()*(len(df['Event'])/len(events['Event'].tolist())))

在代码末尾。

那么现在:

print(df)

输出:

        Date        Place          Event
0 2017-01-01   university    UniHolidays
1 2017-01-01  residential  PublicHoliday
2 2017-01-01     hospital  PublicHoliday
3 2017-01-02   university    UniHolidays
4 2017-01-02  residential  PublicHoliday
5 2017-01-02     hospital  PublicHoliday