我正在一个项目中,我需要从不同来源向Coalsace动态提供多个列名。
e1.csv
id,code,type
1,,A
2,,
3,123,I
e2.csv
id,code,type
1,456,A
2,789,A1
3,,C
Dataset<Row> df1 = spark.read().format("csv").option("header", "true").load("C:\\Users\\System2\\Videos\\folder\\e1.csv");
Dataset<Row> df2 = spark.read().format("csv").option("header", "true").load("C:\\Users\\System2\\Videos\\folder\\e2.csv");
Dataset<Row> newDS = df1.as("a").join(df2.as("b")).where("a.id== b.id").selectExpr("coalesce(a.id, b.id) AS `id`;coalesce(a.code, b.code) AS `code`");
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input ';' expecting <EOF>(line 1, pos 38)
尝试了\n
,,
,;
,但没有一个有效的
Dataset<Row> newDS = df1.as("a").join(df2.as("b")).where("a.id== b.id").selectExpr("coalesce(a.id, b.id) AS `id \n coalesce(a.code, b.code) AS `code`");
答案 0 :(得分:0)
语法错误。您的代码将类似于:
Dataset<Row> newDS = df1.as("a").join(df2.as("b")).where("a.id== b.id").selectExpr("coalesce(a.id, b.id) AS `id`","coalesce(a.code, b.code) AS `code`");