在spark中调用子查询时出错

时间:2016-09-05 10:34:07

标签: mysql apache-spark pyspark apache-spark-sql

我是火花新手。我正在做一个项目,我正在使用pyspark通过spark SQL转换sql server查询。 我在这里运行查询

spark.sql("SELECT b.entityid                        AS EntityID, 
       cm.bldgid                                    AS BldgID, 
       cm.leasid                                    AS LeaseID, 
       COALESCE(l.suitid, (SELECT emptydefault 
                           FROM   emptydefault))    AS SuiteID, 
       cm.inccat                                    AS IncomeCat, 
       (SELECT currdateint 
        FROM   currdateint)                         AS AsAtDate, 
       COALESCE(CASE 
                  WHEN cm.department = '@' THEN (SELECT emptydept 
                                                 FROM   emptydept) 
                  ELSE cm.department 
                END, (SELECT emptydept 
                      FROM   emptydept))            AS Dept, 
       cm.tranamt 
       + COALESCE((SELECT Sum(amt) FROM cmledgapply cma JOIN cmledg cm2 ON 
       cma.tranid = 
       cm2.tranid AND cm2.trandate <= (SELECT currdate2 FROM currdate2) AND 
       (cm2.glref 
       IS NULL OR cm2.glref = 'N') WHERE cma.ptranid = cm.tranid), 0) - 
       COALESCE((SELECT Sum(amt) 
                 FROM   cmledgapply cma 
                        JOIN cmledg cm2 
                          ON cma.ptranid = cm2.tranid 
                             AND cm2.trandate <= (SELECT currdate2 
                                                  FROM   currdate2) 
                             AND ( cm2.glref IS NULL 
                                    OR cm2.glref = 'N' ) 
                 WHERE  cma.tranid = cm.tranid), 0) AS OpenAmt, 
       cm.trandate                                  AS InvoiceDate, 
       Datediff(cm.trandate, (SELECT currdate 
                              FROM   currdate)), 
       CASE 
         WHEN period.dateclsd IS NULL THEN 'Open' 
         ELSE 'Closed' 
       END                                          AS GLClosedStatus, 
       'Unposted'                                   AS GLPostedStatus, 
       'Unpaid'                                     AS PaidStatus 
FROM   cmledg cm 
       JOIN bldg b 
         ON cm.bldgid = b.bldgid 
       JOIN leas l 
         ON cm.bldgid = l.bldgid 
            AND cm.leasid = l.leasid 
       LEFT OUTER JOIN period 
                    ON b.entityid = period.entityid 
                       AND cm.period = period.period 
WHERE  cm.trandate <= (SELECT currdate2 
                       FROM   currdate2) 
       AND Round(( cm.tranamt 
                   + COALESCE((SELECT Sum(amt) FROM cmledgapply cma JOIN cmledg 
                   cm2 ON 
                       cma.tranid 
                             = 
                   cm2.tranid AND cm2.trandate <= (SELECT currdate2 FROM 
                   currdate2) AND 
                             (cm2.glref 
                   IS NULL OR cm2.glref = 'N') WHERE cma.ptranid = cm.tranid), 0 
                   ) - 
                               COALESCE((SELECT Sum(amt) 
                                         FROM   cmledgapply cma 
                                                JOIN cmledg cm2 
                                                  ON cma.ptranid = cm2.tranid 
                                                     AND cm2.trandate <= (SELECT 
                                                         currdate2 
                                                                          FROM 
                                                         currdate2) 
                                                     AND ( cm2.glref IS NULL 
                                                            OR cm2.glref = 'N' ) 
                                         WHERE  cma.tranid = cm.tranid), 0) ), 2 
           ) <> 0 
       AND ( cm.glref IS NULL 
              OR cm.glref = 'N' )").show()

基本上在spark.sql()里面它是一行代码。我刚刚格式化它以便于阅读。 我收到这样的错误

  

u“对未解析的对象树上的dataType调用无效:   'coalesce(scalar-subquery#1402 [],0)

我已经尝试了一些调试,似乎错误发生在最后两个合并或接近它。但我还没想到。如果有人给我一些指导,那将非常有用。请注意,CurrDate是日期类型表,而CurrDate2是时间戳类型。我在这里使用pyspark

提前谢谢

0 个答案:

没有答案