我只是想了解为什么下面的“withColumnRenamed”功能不起作用。我没有理由这样做,但我试图理解它失败的原因:
val a = sqlContext.sql("msck repair table db_name.table_name")
a: org.apache.spark.sql.DataFrame = [result: string]
scala> a.show()
+------+
|result|
+------+
+------+
scala> a.printSchema
root
|-- result: string (nullable = false)
a.withColumnRenamed("result","res")
org.apache.spark.sql.AnalysisException: resolved attribute(s) result#32 missing from result#22 in operator !Project [result#32 AS res#33];
但这有效:
a.select($"result".as("res"))
我查看了计划,看起来在计划的不同阶段为同一列分配了不同的唯一列ID,给定列的唯一ID在整个计划中应该是相同的,对吗?
a.explain(true)
== Parsed Logical Plan ==
HiveNativeCommand msck repair table db_name.table_name
== Analyzed Logical Plan ==
result: string
HiveNativeCommand msck repair table db_name.table_name
== Optimized Logical Plan ==
HiveNativeCommand msck repair table db_name.table_name
== Physical Plan ==
ExecutedCommand HiveNativeCommand msck repair table db_name.table_name
scala> a.queryExecution.logical.output
res76: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(result#140)
scala> a.queryExecution.analyzed.output
res77: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(result#141)
scala> a.queryExecution.optimizedPlan.output
res78: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(result#142)
我正在使用Spark 1.6