在以编程方式展平时重命名数据框中的列使用selectExpr

时间:2017-08-03 07:30:42

标签: scala spark-dataframe

我使用下面的代码链接来展平嵌套数据框Flatten a DataFrame in Scala with different DataTypes inside ....我收到以下错误:

  

线程“main”中的异常org.apache.spark.sql.AnalysisException:   参考'alternateIdentificationQualifierCode'是不明确的,可以   be:alternateIdentificationQualifierCode#2,   alternateIdentificationQualifierCode#11;           在org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:287)           在org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:171)           在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10 $$ anonfun $ applyOrElse $ 4 $$ anonfun $ 26.apply(Analyzer.scala:470)           在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10 $$ anonfun $ applyOrElse $ 4 $$ anonfun $ 26.apply(Analyzer.scala:470)           在org.apache.spark.sql.catalyst.analysis.package $ .withPosition(package.scala:48)           在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10 $$ anonfun $ applyOrElse $ 4.applyOrElse(Analyzer.scala:470)           在org.apache.spark.sql.catalyst.analysis.Analyzer $ ResolveReferences $$ anonfun $ apply $ 10 $$ anonfun $ applyOrElse $ 4.applyOrElse(Analyzer.scala:466)           在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply(TreeNode.scala:335)           在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply(TreeNode.scala:335)           at org.apache.spark.sql.catalyst.trees.CurrentOrigin $ .withOrigin(TreeNode.scala:69)           在org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334)           在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 5.apply(TreeNode.scala:332)           在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 5.apply(TreeNode.scala:332)           在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 4.apply(TreeNode.scala:281)           在scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:328)

有什么方法可以在scala的spark-dataframes中以编程方式重命名列表在此先感谢.. \

代码:

object flatten {

  def main(args: Array[String]) {

    if (args.length < 1) {
      System.err.println("Usage: XMLParser.jar <config.properties>")
      println("Please provide the Configuration File for the XML Parser Job")
      System.exit(1)
    }

    val sc = new SparkContext(new SparkConf().setAppName("Spark XML Process"))
    val sqlContext = new HiveContext(sc)
    val prop = new Properties()
    prop.load(new FileInputStream(args(0)))
    val dfSchema = sqlContext.read.format("com.databricks.spark.xml").option("rowTag",prop.getProperty("xmltag")).load(prop.getProperty("input"))
    val flattened_DataFrame=flattenDf(dfSchema)

   // flattened_DataFrame.printSchema()

  }

1 个答案:

答案 0 :(得分:1)

使用

val renamed_df = df.toDF(Seq("col1","col2","col3"))

重命名列