我不确定我是否在这个问题的正确小组中。 我在Databricks中创建了以下sql代码,但是我收到了错误消息;
SQL语句中的错误:AnalysisException:无法解决 给定输入列的“
a.COUNTRY_ID
”: [a。“ PK_LOYALTYACCOUNT”;“ COUNTRY_ID”;“ CDC_TYPE”, b。“ PK_LOYALTYACCOUNT”;“ COUNTRY_ID”;“ CDC_TYPE”];第7行pos 7;
我知道代码可以正常运行,因为我已经在SQL Server上成功运行了代码 代码如下:
tabled = spark.read.csv("adl://carlslake.azuredatalakestore.net/testfolder/dbo_tabled.csv",inferSchema=True,header=True)
tablee = spark.read.csv("adl://carlslake.azuredatalakestore.net/testfolder/dbo_tablee.csv",inferSchema=True,header=True)
tabled.createOrReplaceTempView('tabled')
tablee.createOrReplaceTempView('tablee')
%sql
; with cmn as
( SELECT a.CDC_TYPE,
a. PK_LOYALTYACCOUNT, --Add these also in CTE result set
a.COUNTRY_ID --Add these also in CTE result set
FROM tabled a
INNER JOIN tablee b
ON a.COUNTRY_ID = b.COUNTRY_ID
AND a.PK_LOYALTYACCOUNT = b.PK_LOYALTYACCOUNT
AND a.CDC_TYPE = 'U'
)
SELECT 1 AS is_deleted,
a.*
FROM tabled a
INNER JOIN cmn
ON a.CDC_TYPE = cmn.CDC_TYPE
and a.COUNTRY_ID = cmn.COUNTRY_ID
AND a.PK_LOYALTYACCOUNT = cmn.PK_LOYALTYACCOUNT
UNION ALL
SELECT 0 AS is_deleted,
b.*
FROM tablee b
INNER JOIN cmn
ON b.CDC_TYPE = cmn.CDC_TYPE
and b.COUNTRY_ID = cmn.COUNTRY_ID
AND b.PK_LOYALTYACCOUNT = cmn.PK_LOYALTYACCOUNT
UNION ALL
SELECT NULL,
a.*
FROM tabled a
WHERE a.CDC_TYPE = 'N'
UNION ALL
SELECT NULL,
b.*
FROM tablee b
WHERE b.CDC_TYPE = 'N'
当我运行简单查询时...
example1 =
spark.sql("""select * from tablee""")
或example2 =
spark.sql("""select * from tabled""")
我得到以下输出,所以我知道表在那里
任何建议都会受到欢迎。
答案 0 :(得分:0)
由于使用的定界符是分号(;),并且作业正在寻找逗号,因此无法正确识别列。问题解决了
答案 1 :(得分:0)
从csv读取时使用分号分隔符
tabled = spark.read.option("delimiter", ";").csv("adl://carlslake.azuredatalakestore.net/testfolder/dbo_tabled.csv",inferSchema=True,header=True)
或
tabled = spark.read.load("adl://carlslake.azuredatalakestore.net/testfolder/dbo_tabled.csv",
format="csv", sep=";", inferSchema="true", header="true")
参考: https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options