PySpark:使用内部联接,案例陈述和Where语句翻译MSSQL代码

时间:2018-06-07 13:22:33

标签: sql-server join filter pyspark where

我试图复制我在MSSQL中编写的代码并将其转换为PySpark。我是PySpark的菜鸟。

查询包含内部联接,嵌入式case语句和一堆where语句进行过滤。

SELECT        Table1.Part, Table1.Serial, Table1.AIRCRAFT_NUMBER, Table1.date_removed,
                         Table2.dbo.E15.TIME, Table2.dbo.E15.TSO, data.dbo.EE18.Allowable_Time,
                         CASE WHEN (data.dbo.EE18.Allowable_Time > 0)
                         THEN data.dbo.EE18.Allowable_Time - Table2.dbo.E15.TSO END AS CAL
FROM            Table1 INNER JOIN
                         Table2.dbo.E15 ON Table1.SEQ_ID = Table2.dbo.E15.SEQ_ID AND
                         Table1.Part = Table2.dbo.E15.Part AND
                         Table1.Serial = Table2.dbo.E15.Serial AND
                         Table1.DATE_REMOVED_DESCENDING = Table2.dbo.E15.DATE_REMOVED_DESCENDING INNER JOIN
                         data.dbo.EE18 ON Table2.dbo.E15.Part = data.dbo.EE18.PART_NUMBER AND
                         Table2.dbo.E15.TIME = data.dbo.EE18.TIME
WHERE        (Table1.Part LIKE '18%') AND (Table2.dbo.E15.TIME = 'I') AND
                         (data.dbo.EE18.Allowable_Time > 0) AND (Table2.dbo.E15.TSO <= 2) OR
                         (Table1.Part LIKE '18%') AND (Table2.dbo.E15.TIME = 'T') AND
                         (data.dbo.EE18.Allowable_Time > 0) AND (Table2.dbo.E15.TSO <= 20) OR
                         (Table1.Part LIKE '18%') AND (Table2.dbo.E15.TIME = 'L') AND
                         (data.dbo.EE18.Allowable_Time > 0) AND (Table2.dbo.E15.TSO <= 8)
ORDER BY Table1.date_removed DESC

上述查询在PySpark代码中的含义是什么?非常感谢任何帮助:)

1 个答案:

答案 0 :(得分:0)

这不是您问题的真正答案,而是展示了您的查询在某些格式设置中看起来更清晰。我还重新设计了谓词的位置,以避免冗余并用括号修复逻辑问题。

SELECT Table1.Part
    , Table1.Serial
    , Table1.AIRCRAFT_NUMBER
    , Table1.date_removed
    , Table2.dbo.E15.TIME
    , Table2.dbo.E15.TSO
    , data.dbo.EE18.Allowable_Time
    , CASE WHEN (data.dbo.EE18.Allowable_Time > 0) THEN data.dbo.EE18.Allowable_Time - Table2.dbo.E15.TSO END AS CAL
FROM Table1 t1
INNER JOIN Table2.dbo.E15 ON Table1.SEQ_ID = Table2.dbo.E15.SEQ_ID 
                        AND Table1.Part = Table2.dbo.E15.Part 
                        AND Table1.Serial = Table2.dbo.E15.Serial 
                        AND Table1.DATE_REMOVED_DESCENDING = Table2.dbo.E15.DATE_REMOVED_DESCENDING 
INNER JOIN data.dbo.EE18 ON Table2.dbo.E15.Part = data.dbo.EE18.PART_NUMBER 
                        AND Table2.dbo.E15.TIME = data.dbo.EE18.TIME
WHERE Table1.Part LIKE '18%' 
    AND data.dbo.EE18.Allowable_Time > 0 
    AND
    (
        Table2.dbo.E15.TIME = 'I' 
        AND 
        Table2.dbo.E15.TSO <= 2
    )
    OR
    (
        Table2.dbo.E15.TIME = 'T'
        AND
        Table2.dbo.E15.TSO <= 20
    )
    OR
    (
        Table2.dbo.E15.TIME = 'L'
        AND
        Table2.dbo.E15.TSO <= 8
    )
ORDER BY Table1.date_removed DESC