我有以下示例数据框:
df_1:
from datetime import datetime
import pandas as pd
>>> df_1 = pd.DataFrame(
{"SVDiscrep_Merge": ["2081916SAN", "2081242DFW", "2081248ORD","20874CLE", "2081740DEN"],
"RON_DATE": [datetime(2017,6,1), datetime(2017,6,4), datetime(2017,6,6), datetime(2017,6,7), datetime(2017,6,8)],
"Next SV1 Date": [datetime(2017,6,4), datetime(2017,6,6), datetime(2017,6,7), datetime(2017,6,8), datetime(2017, 6, 18)]})
>>> df_1
SVDiscrep_Merge RON_DATE Next SV1 Date
2081916SAN 6/1/2017 6/4/2017
2081242DFW 6/4/2017 6/6/2017
2081248ORD 6/6/2017 6/7/2017
20874CLE 6/7/2017 6/8/2017
2081740DEN 6/8/2017 6/18/2017
df_2:
>>> df_2 = pd.DataFrame(
{"SVDiscrep_Merge": ["2081916SAN", "2081916SAN", "2081916SAN","2081740DEN"],
"REPORT_DT": [datetime(2017,6,1), datetime(2017,6,3), datetime(2017,6,4), datetime(2017,6,9)],
"ColA": ["A", "B", "C", "D"]})
>>> df_2
SVDiscrep_Merge REPORT_DT ColA
2081916SAN 6/1/2017 A
2081916SAN 6/3/2017 B
2081916SAN 6/4/2017 C
2081740DEN 6/9/2017 D
我想采用以下逻辑:
如果(且仅当)df_2
在两个数据框架中相等且
df_1
左合并到SVDiscrep_Merge
REPORT_DT
列是> = RON_DATE
中的日期和< Next SV1 Date
中df_1
的日期。
这是我想要的输出:
SVDiscrep_Merge RON_DATE Next SV1 Date ColA
2081916SAN 6/1/2017 6/4/2017 A
2081916SAN 6/4/2017 6/6/2017 B
2081916SAN 6/6/2017 6/7/2017
2081242DFW 6/4/2017 6/6/2017
2081248ORD 6/6/2017 6/7/2017
20874CLE 6/7/2017 6/8/2017
2081740DEN 6/8/2017 6/18/2017 D
我知道如果我没有那个日期逻辑,如何在python代码中进行合并...但是使用那个日期逻辑(在搜索Google之后)我不知所措。
答案 0 :(得分:2)
您可以在SVDiscrep_Merge
上保留合并,然后使用以下布尔掩码过滤结果:
mask = (((result['RON_DATE'] <= result['REPORT_DT'])
& (result['REPORT_DT'] < result['Next SV1 Date']))
| pd.isnull(result['REPORT_DT']))
import datetime as DT
import pandas as pd
df_1 = pd.DataFrame(
{"SVDiscrep_Merge": ["2081916SAN", "2081242DFW", "2081248ORD","20874CLE", "2081740DEN"],
"RON_DATE": [DT.datetime(2017,6,1), DT.datetime(2017,6,4), DT.datetime(2017,6,6), DT.datetime(2017,6,7), DT.datetime(2017,6,8)],
"Next SV1 Date": [DT.datetime(2017,6,4), DT.datetime(2017,6,6), DT.datetime(2017,6,7), DT.datetime(2017,6,8), DT.datetime(2017, 6, 18)]})
df_2 = pd.DataFrame(
{"SVDiscrep_Merge": ["2081916SAN", "2081916SAN", "2081916SAN","2081740DEN"],
"REPORT_DT": [DT.datetime(2017,6,1), DT.datetime(2017,6,3), DT.datetime(2017,6,4), DT.datetime(2017,6,9)],
"ColA": ["A", "B", "C", "D"]})
result = pd.merge(df_1, df_2, on='SVDiscrep_Merge', how='left')
mask = (((result['RON_DATE'] <= result['REPORT_DT'])
& (result['REPORT_DT'] < result['Next SV1 Date']))
| pd.isnull(result['REPORT_DT']))
result = result.loc[mask].drop('REPORT_DT', axis=1)
print(result)
产量
Next SV1 Date RON_DATE SVDiscrep_Merge ColA
0 2017-06-04 2017-06-01 2081916SAN A
1 2017-06-04 2017-06-01 2081916SAN B
3 2017-06-06 2017-06-04 2081242DFW NaN
4 2017-06-07 2017-06-06 2081248ORD NaN
5 2017-06-08 2017-06-07 20874CLE NaN
6 2017-06-18 2017-06-08 2081740DEN D
这不是您发布的理想结果,但它与逻辑描述一致。