这是this
的后续问题我有两个dataframes
:
print df_1
timestamp A B
0 2016-05-15 0.020228 0.026572
1 2016-05-15 0.057780 0.175499
2 2016-05-15 0.098808 0.620986
3 2016-05-17 0.158789 1.014819
4 2016-05-17 0.038129 2.384590
5 2018-05-17 0.011111 9.999999
print df_2
start end event
0 2016-05-14 2016-05-16 E1
1 2016-05-14 2016-05-16 E2
2 2016-05-17 2016-05-18 E3
如果df_1
介于df_2
和{{1}之间,我想合并event column
和df_1
并在timestamp
中获得start
end
。
问题和与this问题的差异是
1)event
和E1
具有相同的E2
和start
。
2)同样在end
中,第六行也不在任何间隔内。
最后,我希望同时拥有这两个事件,对于没有任何事件的行,则要拥有df_1
。
所以我希望我得到的NA
像这样
dataframe
答案 0 :(得分:0)
import pandas as pd
df_1 = pd.DataFrame({'timestamp':['2016-05-15','2016-05-15','2016-05-15','2016-05-17','2016-05-17','2018-05-17'],
'A':[1,1,1,1,1,1]})
df_2 = pd.DataFrame({'start':['2016-05-14','2016-05-14','2016-05-17'],
'end':['2016-05-16','2016-05-16','2016-05-18'],
'event':['E1','E2','E3']})
df_1.timestamp = pd.to_datetime(df_1.timestamp, format='%Y-%m-%d')
df_2.start = pd.to_datetime(df_2.start, format='%Y-%m-%d')
df_2.end = pd.to_datetime(df_2.end, format='%Y-%m-%d')
# convert game_ref_dt to long format with all the dates in between, and do a left merge on date
df_2_2 = pd.melt(df_2, id_vars='event', value_name='timestamp')
df_2_2.timestamp = pd.to_datetime(df_2_2.timestamp)
df_2_2.set_index('timestamp', inplace=True)
df_2_2.drop('variable', axis=1, inplace=True)
df_2_3 = df_2_2.groupby('event').resample('D').ffill().reset_index(level=0, drop=True).reset_index()
df_2 = pd.merge(df_2, df_2_3)
df_2 = df_2.drop(columns=['start', 'end'])
df_1 = df_1.merge(df_2,on='timestamp', how='left')
print(df_1)
timestamp A event
0 2016-05-15 1 E1
1 2016-05-15 1 E2
2 2016-05-15 1 E1
3 2016-05-15 1 E2
4 2016-05-15 1 E1
5 2016-05-15 1 E2
6 2016-05-17 1 E3
7 2016-05-17 1 E3
8 2018-05-17 1 NaN
贷记this
也是这种解决方案,但不在最后一行给出NA
import pandas as pd
df_1 = pd.DataFrame({'timestamp':['2016-05-15','2016-05-15','2016-05-15','2016-05-17','2016-05-17','2018-05-17'],
'A':[1,1,1,1,1,1]})
df_2 = pd.DataFrame({'start':['2016-05-14','2016-05-14','2016-05-17'],
'end':['2016-05-16','2016-05-16','2016-05-18'],
'event':['E1','E2','E3']})
df_try2 = pd.merge(df_1.assign(key=1), df_2.assign(key=1), on='key').query('timestamp >= start and timestamp <= end')
print(df_try2)
timestamp A key start end event
0 2016-05-15 1 1 2016-05-14 2016-05-16 E1
1 2016-05-15 1 1 2016-05-14 2016-05-16 E2
3 2016-05-15 1 1 2016-05-14 2016-05-16 E1
4 2016-05-15 1 1 2016-05-14 2016-05-16 E2
6 2016-05-15 1 1 2016-05-14 2016-05-16 E1
7 2016-05-15 1 1 2016-05-14 2016-05-16 E2
11 2016-05-17 1 1 2016-05-17 2016-05-18 E3
14 2016-05-17 1 1 2016-05-17 2016-05-18 E3