我有两个数据帧,master_source
和main_df
。我想将start_date
中的end_date
和main_df
添加到master_source
,因为这最终将使我能够在两个数据帧上设置匹配索引以进行合并。
我的初始逻辑是检查1)两个数据帧中的market
是否匹配,以及2)viewed_date
中的master_source
是否位于start_date
和{{1之间end_date
中的}}。如果所有条件都已检查完,我想将main_df
和start_date
添加到end_date
。
请注意,master_source
,viewed_date
和start_date
已全部转换为日期时间对象。
以下是每个数据帧的样本输入:
end_date
master_source
viewed_date market
2019-04-15 Abilene, TX
2019-04-11 Yuma, AZ
2019-04-19 Abilene, TX
main_df
我的代码:
market start_date end_date
Abilene, TX 2019-04-11 2019-04-17
Yuma, AZ 2019-04-11 2019-04-17
Abilene, TX 2019-04-18 2019-04-26
到目前为止,我的已知问题是错误def add_dates(row):
matches = main_df[
(main_df['market'] == row['market']) &
(row['viewed_date'].between(main_df['start_date'], main_df['end_date']))]
start = matches['start_date'].values[0] if len(matches) > 0 else None
end = matches['end_date'].values[0] if len(matches) > 0 else None
row.loc['start_end', 'end_date'] = start, end
return row
master_source = master_source.apply(add_dates, axis=1)
,而且我觉得我没有正确添加两个新列,而不是一个新列。
答案 0 :(得分:1)
为开始和结束工作分别进行操作:
def add_start_dates(market, viewed):
matches = main_df[(main_df['market'] == market)]
matches2 = matches[(matches['start_date'] <= viewed)&
(matches['end_date'] >= viewed)]
if len(matches2)>0:
return matches2['start_date'].iloc[0]
else:
return viewed
类似于结束日期。
print master_source
print
print main_df
print
master_source['start_date'] = [add_start_dates(m, v) for m, v in zip(master_source['market'],
master_source['viewed_date'])]
print master_source
产量:
market viewed_date
0 abilene 2019-04-15
1 yuma 2019-04-11
2 abilene 2019-04-19
end_date market start_date
0 2019-04-17 abilene 2019-04-11
1 2019-04-17 yuma 2019-04-11
2 2019-04-26 abilene 2019-04-18
market viewed_date start_date
0 abilene 2019-04-15 2019-04-11
1 yuma 2019-04-11 2019-04-11
2 abilene 2019-04-19 2019-04-18