Spark列重命名

时间:2017-09-26 17:33:20

标签: scala apache-spark apache-spark-sql

我只是想了解为什么下面的“withColumnRenamed”功能不起作用。我没有理由这样做,但我试图理解它失败的原因:

val a = sqlContext.sql("msck repair table db_name.table_name")
a: org.apache.spark.sql.DataFrame = [result: string]
scala> a.show()
+------+
|result|
+------+
+------+


scala> a.printSchema
root
 |-- result: string (nullable = false)


    a.withColumnRenamed("result","res")
    org.apache.spark.sql.AnalysisException: resolved attribute(s) result#32 missing from result#22 in operator !Project [result#32 AS res#33];

但这有效:

a.select($"result".as("res"))

我查看了计划,看起来在计划的不同阶段为同一列分配了不同的唯一列ID,给定列的唯一ID在整个计划中应该是相同的,对吗?

a.explain(true)
== Parsed Logical Plan ==
HiveNativeCommand msck repair table db_name.table_name
== Analyzed Logical Plan ==
result: string
HiveNativeCommand msck repair table db_name.table_name
== Optimized Logical Plan ==
HiveNativeCommand msck repair table db_name.table_name
== Physical Plan ==
ExecutedCommand HiveNativeCommand msck repair table db_name.table_name
scala> a.queryExecution.logical.output
res76: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(result#140)
scala> a.queryExecution.analyzed.output
res77: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(result#141)
scala> a.queryExecution.optimizedPlan.output
res78: Seq[org.apache.spark.sql.catalyst.expressions.Attribute] = List(result#142)

我正在使用Spark 1.6

0 个答案:

没有答案