所以我有两个表,我想在其中做一个cache2
并过滤left join
中我的date
列在df1
和{{1之间的行from
中的}}列。
请注意to
,它没有df2
,最终会导致问题。
df1 :
row 6
df2 :
ClockInDate
我以为我可以在熊猫中用 Company Resource ClockInDate
0 A ResA 2019-02-09
1 A ResB 2019-02-09
2 A ResC 2019-02-09
3 B ResD 2019-02-09
4 B ResE 2019-02-09
5 B ResF 2019-02-09
6 B ResG NaT
做到这一点,然后再应用过滤器。
但是它提供了不同的输出。
因此,在SQL中,您可以像这样在 Company Resource EffectiveFrom EffectiveTo
0 A ResA 2018-01-01 2018-12-31
1 A ResA 2019-01-01 2099-12-31
2 A ResB 2018-01-01 2018-12-31
3 A ResB 2019-01-01 2099-12-31
4 B ResE 2018-01-01 2018-12-31
5 B ResE 2019-01-01 2099-12-31
6 B ResF 2018-01-01 2018-12-31
7 B ResF 2019-01-01 2099-12-31
8 B ResG 2018-01-01 2018-12-31
9 B ResG 2019-01-01 2099-12-31
子句中包含此过滤器,但与在left merge
子句中进行联接之后包含此过滤器不一样:
ON
通知部分:WHERE
注意:在SQL代码 SELECT t1.company,
t1.resource,
t2.company,
t2.resource,
t1.ClockInDate,
t2.EffectiveFrom,
t2.EffectiveTo
FROM table1 t1
LEFT JOIN table2 t2 ON t1.resource = t2.resource
AND t1.company = t2.company
AND t1.ClockInDate BETWEEN t2.EffectiveFrom AND t2.EffectiveTo
中,AND t1.ClockInDate BETWEEN t2.EffectiveFrom AND t2.EffectiveTo
是df1
SQL输出(这是我的预期输出):
t1
所以这是我在df2
中的代码:
Python输出
t2
因此请注意,资源 t1.Company t1.Resource t1.ClockInDate t2.EffectiveFrom t2.EffectiveTo
0 A ResA 2019-02-09 2019-01-01 2099-12-31
1 A ResB 2019-02-09 2019-01-01 2099-12-31
2 A ResC NaT NaT NaT
3 B ResD NaT NaT NaT
4 B ResE 2019-02-09 2019-01-01 2099-12-31
5 B ResF 2019-02-09 2019-01-01 2099-12-31
6 B ResG NaT NaT NaT
的最后一行未包含在我的Python输出中。
复制并粘贴代码以重制Python
df_merge = pd.merge(df1, df2, on=['Company', 'Resource'], how='left')
df_final = df_merge[df_merge.ClockInDate.between(df_merge.EffectiveFrom, df_merge.EffectiveTo) | df_merge.EffectiveFrom.isnull()]
#Output:
Company Resource ClockInDate EffectiveFrom EffectiveTo
1 A ResA 2019-02-09 2019-01-01 2099-12-31
3 A ResB 2019-02-09 2019-01-01 2099-12-31
4 A ResC 2019-02-09 NaT NaT
5 B ResD 2019-02-09 NaT NaT
7 B ResE 2019-02-09 2019-01-01 2099-12-31
9 B ResF 2019-02-09 2019-01-01 2099-12-31
答案 0 :(得分:0)
因此,在从事这个项目之后,我获得了更多的见识。我找到了一种解决方案,但希望有一个cleaner
。但这可行:我们可以从原始数据帧中合并具有ClockIndate.isnull
的行:
df_merge = pd.merge(df1, df2, on=['Company', 'Resource'], how='left')
df_filter = df_merge[df_merge.ClockInDate.between(df_merge.EffectiveFrom, df_merge.EffectiveTo) | df_merge.EffectiveFrom.isnull()]
df_final = pd.concat([df_filter, df1[df1.ClockInDate.isnull()]], sort=True)
print(df_final)
ClockInDate Company EffectiveFrom EffectiveTo Resource
1 2019-02-09 A 2019-01-01 2099-12-31 ResA
3 2019-02-09 A 2019-01-01 2099-12-31 ResB
4 2019-02-09 A NaT NaT ResC
5 2019-02-09 B NaT NaT ResD
7 2019-02-09 B 2019-01-01 2099-12-31 ResE
9 2019-02-09 B 2019-01-01 2099-12-31 ResF
6 NaT B NaT NaT ResG
答案 1 :(得分:-1)
sql,其中:
SELECT t1.company,
t1.resource,
t2.company,
t2.resource,
t1.ClockInDate,
t2.EffectiveFrom,
t2.EffectiveTo
FROM table1 t1
LEFT JOIN table2 t2 ON t1.resource = t2.resource
AND t1.company = t2.company
WHERE t1.ClockInDate IS NULL --no ClockInDate to check
OR t2.company IS NULL AND t2.resource IS NULL --not rows in t2 for t1
OR t1.ClockInDate BETWEEN t2.EffectiveFrom AND t2.EffectiveTo --ClockInDate exists, rows in t2 exist, we can now check ClockInDate to be between t2.EffectiveFrom AND t2.EffectiveTo
会转换为python:
df_merge = pd.merge(df1, df2, on=['Company', 'Resource'], how='left')
df_final = df_merge[df_merge.ClockInDate.isnull() | df_merge.ClockInDate.between(df_merge.EffectiveFrom, df_merge.EffectiveTo) | df_merge.EffectiveFrom.isnull()]