在Spark中对多维数据集数据集使用联接时是否有任何约束

时间:2019-02-19 10:25:33

标签: java apache-spark join apache-spark-dataset

我正在尝试使用两列的右外部将具有交叉数据集的多维数据集与下图所示连接起来,但是我无法解决冲突连接的问题。

我尝试更改联接,但无法解决。

val data =Seq(("Data2","DATA1",3152)).toDF("column1","column2","value1");
+-------+-------+------+
|column1|column2|value1|
+-------+-------+------+
|  Data2|  DATA1|  3152|
+-------+-------+------+

val cubeDS= data.cube("column1","column2").agg(sum("value1"));
+-------+-------+-----------+
|column1|column2|sum(value1)|
+-------+-------+-----------+
|  Data2|  DATA1|       3152|
|   null|   null|       3152|
|   null|  DATA1|       3152|
|  Data2|   null|       3152|
+-------+-------+-----------+

val side =cubeDS.select("column1").distinct().limit(1);
+-------+
|column1|
+-------+
|   null|
+-------+

val top =cubeDS.select("column2").distinct().limit(1);
+-------+
|column2|
+-------+
|   null|
+-------+

val cross=side.crossJoin(top);
+-------+-------+
|column1|column2|
+-------+-------+
|   null|   null|
+-------+-------+

预期结果;

+-------+-------+-----------+
|column1|column2|sum(value1)|
+-------+-------+-----------+
|   null|   null|       3152|
+-------+-------+-----------+

0 个答案:

没有答案