这是我的代码,除了
A.except(B)
A和B的原始架构如下所示
字符串列字符串列B列字符串C列
之前和我之前
.withColumn("Column B", explode(col("Column B")))
这是我收到的确切错误消息
User class threw exception: org.apache.spark.sql.AnalysisException:
Except can only be performed on tables with the compatible column
types. array<string> <> string at the second column of the second
table;;
'Except
:- Project [Column A#14, Column B#170, Column C#112]
: +- Generate explode(Column B#126), true, false, [Column B#170]
: +- Project [Column A#14, Column B#126, Column C#112]
: +- Aggregate [Column A#14, Column C#112],
[Column A#14, Column C#112, collect_list(Column B#15, 0, 0) AS
Column B#126]
: +- Project [Column A#14, Column B#15, zeroString#16,
Column C#9, substring_index(Column C#33, :0, 1) AS Column C#112]
: +- Deduplicate [Column C#9, Column C#33,
zeroString#16, Column B#15, Column A#14], false
: +- Project [Column A#14, Column B#15, zeroString#16,
Column C#9, Column C#33]
: +- Generate explode(if ((isnull(0) || isnull(10)))
null else UDF(Column C#9, 0, 10)), true, false, [Column C#33]
: +- Filter (UDF(Column C#9) >= 100)
: +- Filter Column A#14 IN (1)
: +- Project [rawKey#5[0] AS
Column A#14, rawKey#5[1] AS Column B#15, rawKey#5[2] AS
zeroString#16, Column C#9]
: +- Project [rawKey#5,
rawValue#1, split(rawValue#1, \|) AS Column C#9]
: +- Project
[split(rawKey#0, \|) AS rawKey#5,
rawValue#1]
: +-
Relation[rawKey#0,rawValue#1] csv
+- LogicalRDD [Column A#162, Column B#163, Column C#164]