"构建LocalRelation"时发现未解析的属性,有人可以解释一下吗?

时间:2017-07-29 15:45:57

标签: scala apache-spark apache-spark-sql spark-dataframe

我的数据框为

+---+---+---+---+
|A  |B  |C  |D  |
+---+---+---+---+
|a  |b  |b  |c  |
+---+---+---+---+

我通过执行以下操作将columns转换为structs

import org.apache.spark.sql.functions._

val df = myDF.withColumn("colA", struct($"A", $"B"))
  .withColumn("colB", struct($"C".as("A"), $"D".as("B")))

dataframeschema

+-----+-----+
|colA |colB |
+-----+-----+
|[a,b]|[b,c]|
+-----+-----+

root
 |-- colA: struct (nullable = false)
 |    |-- A: string (nullable = true)
 |    |-- B: string (nullable = true)
 |-- colB: struct (nullable = false)
 |    |-- A: string (nullable = true)
 |    |-- B: string (nullable = true)

我想将两个struct列合并到一列中,所以我做

df.select(array(struct($"colA.A", $"colA.B"),struct($"colB.A", $"colB.B")).as("Result"))

将正确的dataframeschema设为

+--------------+
|Result        |
+--------------+
|[[a,b], [b,c]]|
+--------------+

root
 |-- Result: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- A: string (nullable = true)
 |    |    |-- B: string (nullable = true)

我可以通过

获得相同的结果
df.select(array(struct($"A", $"B"),struct($"C".as("A"), $"D".as("B"))).as("Result"))

现在,如果我们看一下整个过程,我们就有了

$"colA" == struct($"A", $"B") == struct($"colA.A", $"colA.B")

$"colB" == struct($"C".as("A"), $"D".as("B")) == struct($"colB.A", $"colB.B")

但是

当我做的时候

df.select(array($"colA", $"colB").as("Result"))

我收到以下错误

  
    

要求失败:构造LocalRelation时找到未解析的属性。     java.lang.IllegalArgumentException:要求失败:构造LocalRelation时找到未解析的属性。         在scala.Predef $ .require(Predef.scala:219)         在org.apache.spark.sql.catalyst.plans.logical.LocalRelation。(LocalRelation.scala:50)         在org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation $$ anonfun $ apply $ 33.applyOrElse(Optimizer.scala:1402)         在org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation $$ anonfun $ apply $ 33.applyOrElse(Optimizer.scala:1398)         在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 3.apply(TreeNode.scala:286)     .......     ........

  

错误的含义是什么?我应该如何纠正?

0 个答案:

没有答案