pyspark.sql.utils.ParseException:u“ \ nextraneous>输入'xxx',期望{')',','}

时间:2019-02-08 14:54:00

标签: sql apache-spark pyspark pyspark-sql

我有2个主要表格:航班和假期。

航班的标识为:outboundlegid, inboundlegid, agent, querydatetime。适用于该问题的其他列为out_date, in_date。它们指示航班何时起飞以及返回日期。

“假期”列为start, end, type

我想确定假期的出发/起飞日期是否与假日表中的任何东西相交。

我遵循PySpark: How to add columns whose data come from a query (similar to subquery for each row)的一些建议来确定出/入日是否与任何假期相交。

但是,我得到:“ pyspark.sql.utils.ParseException:u” \ nextraneous

  

输入'outboundlegid'期望为{')',','}(第35行,位置12)“。这是怎么回事?

     

文件“ script_2019-02-08-10-46-14.py”,第182行,“”“中)文件   “ /mnt/yarn/usercache/root/appcache/application_1549622095592_0002/container_1549622095592_0002_01_000001/pyspark.zip/pyspark/sql/session.py”,   sql文件中的第603行   “ /mnt/yarn/usercache/root/appcache/application_1549622095592_0002/container_1549622095592_0002_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py”,   第1133行,在通话文件中   “ /mnt/yarn/usercache/root/appcache/application_1549622095592_0002/container_1549622095592_0002_01_000001/pyspark.zip/pyspark/sql/utils.py”,   第73行,在装饰pyspark.sql.utils.ParseException中:u“ \ nextraneous   输入'outboundlegid'期望{')',','}(第35行,位置12)\ n \ n == SQL   == \ n \ n WITH t(\ n SELECT \ n f.outboundlegid,\ n f.inboundlegid,\ n f.agent,\ n f.querydatetime,\ n类型='HOLIDAY'AND(out_date   在开始和结束之间)\ n然后为真\ n其他为假\ n结束   out_is_holiday,\ n输入类型='LONG_WEEKENDS'AND(out_date   在开始和结束之间)\ n然后为真\ n其他为假\ n结束   out_is_longweekends,\ n类型='HOLIDAY'且(在in_date之间   开始和结束)\ n然后为真\ n其他为假\ n结束in_is_holiday,\ n   当类型='LONG_WEEKENDS'AND(开始和结束之间的in_date)\ n THEN   正确\ n ELSE否\ n结束in_is_longweekends \ n从航班f \ n穿越   加入假期h \ n)\ n选择\ n f。*,\ n t1.out_is_holiday,\ n   t1.out_is_longweekends,\ n t1.in_is_holiday,\ n t1.in_is_longweekends,\ n   FROM(\ n选择\ n outboundlegid,\ n ------------ ^^^ \ n inboundlegid,\ n   代理,\ n查询日期时间,\ n情况为   array_contains(collect_set(out_is_holiday),true)\ n然后为true \ n ELSE   false \ n END out_is_holiday,\ n情况如下   array_contains(collect_set(out_is_longweekends),true)\ n然后是true \ n   ELSE错误\ n END out_is_long周末,\ n   array_contains(collect_set(in_is_holiday),true)\ n然后是true \ n ELSE   false \ n END in_is_holiday,\ n情况如下   array_contains(collect_set(in_is_longweekends),true)\ n然后是true \ n   否则为假\ n

这是什么问题?

resultDf = spark.sql("""
    WITH t (
        SELECT  
            f.outboundlegid,
            f.inboundlegid,
            f.agent,
            f.querydatetime,
            CASE WHEN type = 'HOLIDAY' AND (out_date BETWEEN start AND end)
                THEN true
                ELSE false
                END out_is_holiday,
            CASE WHEN type = 'LONG_WEEKENDS' AND (out_date BETWEEN start AND end)
                THEN true
                ELSE false
                END out_is_longweekends,
            CASE WHEN type = 'HOLIDAY' AND (in_date BETWEEN start AND end)
                THEN true
                ELSE false
                END in_is_holiday,
            CASE WHEN type = 'LONG_WEEKENDS' AND (in_date BETWEEN start AND end)
                THEN true
                ELSE false
                END in_is_longweekends
        FROM flights f
        CROSS JOIN holidays h
    )
    SELECT 
        f.*,
        t1.out_is_holiday,
        t1.out_is_longweekends,
        t1.in_is_holiday,
        t1.in_is_longweekends,
    FROM (
        SELECT 
            outboundlegid,   # <<< I am guessing something wrong with this? But Why?
            inboundlegid,
            agent,
            querydatetime,
            CASE WHEN array_contains(collect_set(out_is_holiday), true)
                THEN true
                ELSE false
                END out_is_holiday,
            CASE WHEN array_contains(collect_set(out_is_longweekends), true)
                THEN true
                ELSE false
                END out_is_longweekends,
            CASE WHEN array_contains(collect_set(in_is_holiday), true)
                THEN true
                ELSE false
                END in_is_holiday,
            CASE WHEN array_contains(collect_set(in_is_longweekends), true)
                THEN true
                ELSE false
                END in_is_longweekends
        FROM t
        GROUP BY 
            querydatetime, 
            outboundlegid,
            inboundlegid,
            agent
        LIMIT 100000
    ) t1
    INNER JOIN flights f
    ON t1.querydatetime = f.querydatetime
    AND t1.outboundlegid = f.outboundlegid
    AND t1.inboundlegid = f.inboundlegid
    AND t1.agent = f.agent
    INNER JOIN agents a
    ON f.agent = a.id
    INNER JOIN airports p
    ON f.querydestinationplace = p.airportId
""")

0 个答案:

没有答案