当一个爆炸结果为空时,如何合并两个爆炸结果?

时间:2020-05-26 07:09:30

标签: apache-spark apache-spark-sql

env:spark2.4.5

我的Spark sql:

SELECT A.*
FROM table_0
    LATERAL VIEW explode(table_0.Array_0) exploded_a_values AS A
UNION
SELECT B.*
FROM table_0
    LATERAL VIEW explode(table_0.Array_1) exploded_a_values AS B

爆炸结构A和B具有相同的架构。其中之一为空时会发生错误:

Can only star expand struct data types. Attribute: `ArrayBuffer)`;

请注意,数组中的元素是struct类型。我的目的是挑选不同阵列中的不同元素。

那么我该如何处理这种空箱呢?如果您能给我一些建议,我将不胜感激。

1 个答案:

答案 0 :(得分:1)

使用explode(table_0.Array_0) exploded_a_values AS A爆炸数组时,

这里

  • exploded_a_values成为table
  • A成为代表exploded column的列

因此,您不能为此呼叫A.*,但是您可以呼叫exploded_a_values.*

因此,修改后的查询将如下所示-

1。阅读输入内容

     val table_0 =  spark.range(1, 5)
      .withColumn("Array_0", array(lit(1), lit(2)))
      .withColumn("Array_1", array(lit(null).cast(IntegerType)))
    table_0.show(false)
    table_0.printSchema()

输出-

+---+-------+-------+
|id |Array_0|Array_1|
+---+-------+-------+
|1  |[1, 2] |[]     |
|2  |[1, 2] |[]     |
|3  |[1, 2] |[]     |
|4  |[1, 2] |[]     |
+---+-------+-------+

root
 |-- id: long (nullable = false)
 |-- Array_0: array (nullable = false)
 |    |-- element: integer (containsNull = false)
 |-- Array_1: array (nullable = false)
 |    |-- element: integer (containsNull = true)

2。运行联合查询

    table_0.createOrReplaceTempView("table_0")

    val processed = spark.sql(
      """
        |SELECT exploded_a_values.*, table_0.id
        |FROM table_0
        |    LATERAL VIEW explode(table_0.Array_0) exploded_a_values AS A
        |UNION
        |SELECT exploded_b_values.*, table_0.id
        |FROM table_0
        |    LATERAL VIEW explode(table_0.Array_1) exploded_b_values AS B
      """.stripMargin)
    processed.show(false)
    processed.printSchema()

输出-

+----+---+
|A   |id |
+----+---+
|2   |2  |
|2   |4  |
|null|2  |
|null|4  |
|1   |1  |
|2   |1  |
|1   |2  |
|1   |3  |
|2   |3  |
|1   |4  |
|null|1  |
|null|3  |
+----+---+

root
 |-- A: integer (nullable = true)
 |-- id: long (nullable = false)

注意:联合只能在具有兼容列类型的表上执行。

Edit-1(根据评论)

尝试过Array<struct>,相同的查询对我来说效果很好- 结果如下:

+------+---+
|A     |id |
+------+---+
|[a, 2]|1  |
|[a, 2]|2  |
|[a, 2]|4  |
|null  |2  |
|null  |4  |
|[a, 2]|3  |
|null  |1  |
|null  |3  |
+------+---+

root
 |-- A: struct (nullable = true)
 |    |-- f1: string (nullable = false)
 |    |-- f2: integer (nullable = false)
 |-- id: long (nullable = false)

有关完整示例,请参阅-this gist