我是火花新手。我正在做一个项目,我正在使用pyspark通过spark SQL转换sql server查询。 我在这里运行查询
spark.sql("SELECT b.entityid AS EntityID,
cm.bldgid AS BldgID,
cm.leasid AS LeaseID,
COALESCE(l.suitid, (SELECT emptydefault
FROM emptydefault)) AS SuiteID,
cm.inccat AS IncomeCat,
(SELECT currdateint
FROM currdateint) AS AsAtDate,
COALESCE(CASE
WHEN cm.department = '@' THEN (SELECT emptydept
FROM emptydept)
ELSE cm.department
END, (SELECT emptydept
FROM emptydept)) AS Dept,
cm.tranamt
+ COALESCE((SELECT Sum(amt) FROM cmledgapply cma JOIN cmledg cm2 ON
cma.tranid =
cm2.tranid AND cm2.trandate <= (SELECT currdate2 FROM currdate2) AND
(cm2.glref
IS NULL OR cm2.glref = 'N') WHERE cma.ptranid = cm.tranid), 0) -
COALESCE((SELECT Sum(amt)
FROM cmledgapply cma
JOIN cmledg cm2
ON cma.ptranid = cm2.tranid
AND cm2.trandate <= (SELECT currdate2
FROM currdate2)
AND ( cm2.glref IS NULL
OR cm2.glref = 'N' )
WHERE cma.tranid = cm.tranid), 0) AS OpenAmt,
cm.trandate AS InvoiceDate,
Datediff(cm.trandate, (SELECT currdate
FROM currdate)),
CASE
WHEN period.dateclsd IS NULL THEN 'Open'
ELSE 'Closed'
END AS GLClosedStatus,
'Unposted' AS GLPostedStatus,
'Unpaid' AS PaidStatus
FROM cmledg cm
JOIN bldg b
ON cm.bldgid = b.bldgid
JOIN leas l
ON cm.bldgid = l.bldgid
AND cm.leasid = l.leasid
LEFT OUTER JOIN period
ON b.entityid = period.entityid
AND cm.period = period.period
WHERE cm.trandate <= (SELECT currdate2
FROM currdate2)
AND Round(( cm.tranamt
+ COALESCE((SELECT Sum(amt) FROM cmledgapply cma JOIN cmledg
cm2 ON
cma.tranid
=
cm2.tranid AND cm2.trandate <= (SELECT currdate2 FROM
currdate2) AND
(cm2.glref
IS NULL OR cm2.glref = 'N') WHERE cma.ptranid = cm.tranid), 0
) -
COALESCE((SELECT Sum(amt)
FROM cmledgapply cma
JOIN cmledg cm2
ON cma.ptranid = cm2.tranid
AND cm2.trandate <= (SELECT
currdate2
FROM
currdate2)
AND ( cm2.glref IS NULL
OR cm2.glref = 'N' )
WHERE cma.tranid = cm.tranid), 0) ), 2
) <> 0
AND ( cm.glref IS NULL
OR cm.glref = 'N' )").show()
基本上在spark.sql()里面它是一行代码。我刚刚格式化它以便于阅读。 我收到这样的错误
u“对未解析的对象树上的dataType调用无效: 'coalesce(scalar-subquery#1402 [],0)
我已经尝试了一些调试,似乎错误发生在最后两个合并或接近它。但我还没想到。如果有人给我一些指导,那将非常有用。请注意,CurrDate是日期类型表,而CurrDate2是时间戳类型。我在这里使用pyspark
提前谢谢