合并带有类似SQL的联接的表,并在左联接中使用过滤器(介于之间)

时间:2019-02-08 11:00:40

标签: python pandas tsql join filter

所以我有两个表,我想在其中做一个cache2并过滤left join中我的date列在df1和{{1之间的行from中的}}列。

请注意to,它没有df2,最终会导致问题。

df1

row 6

df2

ClockInDate

我以为我可以在熊猫中用 Company Resource ClockInDate 0 A ResA 2019-02-09 1 A ResB 2019-02-09 2 A ResC 2019-02-09 3 B ResD 2019-02-09 4 B ResE 2019-02-09 5 B ResF 2019-02-09 6 B ResG NaT 做到这一点,然后再应用过滤器。
但是它提供了不同的输出。

因此,在SQL中,您可以像这样在 Company Resource EffectiveFrom EffectiveTo 0 A ResA 2018-01-01 2018-12-31 1 A ResA 2019-01-01 2099-12-31 2 A ResB 2018-01-01 2018-12-31 3 A ResB 2019-01-01 2099-12-31 4 B ResE 2018-01-01 2018-12-31 5 B ResE 2019-01-01 2099-12-31 6 B ResF 2018-01-01 2018-12-31 7 B ResF 2019-01-01 2099-12-31 8 B ResG 2018-01-01 2018-12-31 9 B ResG 2019-01-01 2099-12-31 子句中包含此过滤器,但与在left merge子句中进行联接之后包含此过滤器不一样:

ON

通知部分:WHERE
注意:在SQL代码 SELECT t1.company, t1.resource, t2.company, t2.resource, t1.ClockInDate, t2.EffectiveFrom, t2.EffectiveTo FROM table1 t1 LEFT JOIN table2 t2 ON t1.resource = t2.resource AND t1.company = t2.company AND t1.ClockInDate BETWEEN t2.EffectiveFrom AND t2.EffectiveTo 中,AND t1.ClockInDate BETWEEN t2.EffectiveFrom AND t2.EffectiveTodf1

SQL输出(这是我的预期输出):

t1

所以这是我在df2中的代码:

Python输出

t2

因此请注意,资源 t1.Company t1.Resource t1.ClockInDate t2.EffectiveFrom t2.EffectiveTo 0 A ResA 2019-02-09 2019-01-01 2099-12-31 1 A ResB 2019-02-09 2019-01-01 2099-12-31 2 A ResC NaT NaT NaT 3 B ResD NaT NaT NaT 4 B ResE 2019-02-09 2019-01-01 2099-12-31 5 B ResF 2019-02-09 2019-01-01 2099-12-31 6 B ResG NaT NaT NaT 的最后一行未包含在我的Python输出中。

复制并粘贴代码以重制Python

df_merge = pd.merge(df1, df2, on=['Company', 'Resource'], how='left')
df_final = df_merge[df_merge.ClockInDate.between(df_merge.EffectiveFrom, df_merge.EffectiveTo) | df_merge.EffectiveFrom.isnull()]

#Output:

    Company Resource    ClockInDate EffectiveFrom   EffectiveTo
1   A       ResA        2019-02-09  2019-01-01      2099-12-31
3   A       ResB        2019-02-09  2019-01-01      2099-12-31
4   A       ResC        2019-02-09  NaT             NaT
5   B       ResD        2019-02-09  NaT             NaT
7   B       ResE        2019-02-09  2019-01-01      2099-12-31
9   B       ResF        2019-02-09  2019-01-01      2099-12-31

2 个答案:

答案 0 :(得分:0)

因此,在从事这个项目之后,我获得了更多的见识。我找到了一种解决方案,但希望有一个cleaner。但这可行:我们可以从原始数据帧中合并具有ClockIndate.isnull的行:

df_merge = pd.merge(df1, df2, on=['Company', 'Resource'], how='left')

df_filter = df_merge[df_merge.ClockInDate.between(df_merge.EffectiveFrom, df_merge.EffectiveTo) | df_merge.EffectiveFrom.isnull()]

df_final = pd.concat([df_filter, df1[df1.ClockInDate.isnull()]], sort=True)

print(df_final)
  ClockInDate Company EffectiveFrom EffectiveTo Resource
1  2019-02-09       A    2019-01-01  2099-12-31     ResA
3  2019-02-09       A    2019-01-01  2099-12-31     ResB
4  2019-02-09       A           NaT         NaT     ResC
5  2019-02-09       B           NaT         NaT     ResD
7  2019-02-09       B    2019-01-01  2099-12-31     ResE
9  2019-02-09       B    2019-01-01  2099-12-31     ResF
6         NaT       B           NaT         NaT     ResG

答案 1 :(得分:-1)

等同于

sql,其中:

SELECT t1.company,
        t1.resource,
        t2.company,
        t2.resource,
        t1.ClockInDate,
        t2.EffectiveFrom,
        t2.EffectiveTo
FROM table1 t1
LEFT JOIN table2 t2 ON t1.resource = t2.resource
                    AND t1.company = t2.company
WHERE t1.ClockInDate IS NULL --no ClockInDate to check
    OR t2.company IS NULL AND t2.resource IS NULL --not rows in t2 for t1
    OR t1.ClockInDate BETWEEN t2.EffectiveFrom AND t2.EffectiveTo --ClockInDate exists, rows in t2 exist, we can now check ClockInDate to be between t2.EffectiveFrom AND t2.EffectiveTo

会转换为python:

df_merge = pd.merge(df1, df2, on=['Company', 'Resource'], how='left')
df_final = df_merge[df_merge.ClockInDate.isnull() | df_merge.ClockInDate.between(df_merge.EffectiveFrom, df_merge.EffectiveTo) | df_merge.EffectiveFrom.isnull()]