有两个数据帧df
和events
如下所示:
import pandas as pd
df = pd.DataFrame({'Place':['university','residential','hospital','university','residential','hospital'],
'Date':['2017-01-01','2017-01-01','2017-01-01','2017-01-02','2017-01-02','2017-01-02'],
'Event':['None','None','None','None','None','None']
})
events = pd.DataFrame({'Place':['university','residential','hospital'], 'Start_Date':['2017-01-01','2017-01-01','2017-01-01'],
'End_Date':['2017-02-26','2017-01-02','2017-01-02'],
'Event':['UniHolidays','PublicHoliday','PublicHoliday']})
#Convert to datetime
events.Start_Date = pd.to_datetime(events.Start_Date.astype(str), format='%Y-%m-%d')
events.End_Date = pd.to_datetime(events.End_Date.astype(str), format='%Y-%m-%d')
df.Date = pd.to_datetime(df.Date.astype(str), format='%Y-%m-%d')
df在每个位置的2017年每个日期都有1条记录
df:
Date Place Event
2017-01-01 university None
2017-01-01 residential None
2017-01-01 hospital None
2017-01-02 university None
2017-01-02 residential None
2017-01-02 hospital None
第二个数据框包含这些地点的事件,但具有日期范围
events:
Place Start_Date End_Date Event
a 2017-01-01 2017-02-26 UniHoliday
b 2017-01-01 2017-01-02 PublicHoliday
c 2017-01-01 2017-01-02 PublicHoliday
任务是使用df
更新events
,
如果df.Place
= events.Place
并且df.Date
在范围(events.Start_Date, events.End_Date
)内,则df.Event
应该用相应的event.Event
更新>
预期输出为:
Date Place Event
2017-01-01 university UniHoliday
2017-01-01 residential PublicHoliday
2017-01-01 hospital PublicHoliday
2017-01-02 university UniHoliday
2017-01-02 residential PublicHoliday
2017-01-02 hospital PublicHoliday
没有重叠的事件,每个地方都有唯一的事件记录
到目前为止,我一直在考虑以下方面: Populate column in data frame based on a range found in another dataframe ,但是我无法解决。任何帮助表示赞赏。谢谢!
答案 0 :(得分:0)
解决方案1:
添加:
df['Event']=events['Event'].tolist()*2
在代码末尾。
那么现在:
print(df)
是:
Date Event Place
0 2017-01-01 UniHolidays university
1 2017-01-01 PublicHoliday residential
2 2017-01-01 PublicHoliday hospital
3 2017-01-02 UniHolidays university
4 2017-01-02 PublicHoliday residential
5 2017-01-02 PublicHoliday hospital
----------------------------------------
解决方案2:
如果希望他们添加在正确的位置,请执行以下操作:
df=df.drop('Event',1)
df.insert(2,'Event',events['Event'].tolist()*2)
在代码末尾。
那么现在:
print(df)
输出:
Date Place Event
0 2017-01-01 university UniHolidays
1 2017-01-01 residential PublicHoliday
2 2017-01-01 hospital PublicHoliday
3 2017-01-02 university UniHolidays
4 2017-01-02 residential PublicHoliday
5 2017-01-02 hospital PublicHoliday
解决方案1 + 解决方案2 将起作用,
但最好还是单做。
使用:
df=df.drop('Event',1)
df.insert(2,'Event',events['Event'].tolist()*(len(df['Event'])/len(events['Event'].tolist())))
在代码末尾。
那么现在:
print(df)
输出:
Date Place Event
0 2017-01-01 university UniHolidays
1 2017-01-01 residential PublicHoliday
2 2017-01-01 hospital PublicHoliday
3 2017-01-02 university UniHolidays
4 2017-01-02 residential PublicHoliday
5 2017-01-02 hospital PublicHoliday