Spark SQL合并函数无法评估

时间:2018-03-27 21:28:44

标签: apache-spark apache-spark-sql coalesce

我在源数据帧和较小的"覆盖"之间进行外连接。数据框,我想使用coalesce函数:

    val outputColumns: Array[Column] = dimensionColumns.map(dc => etlDf(dc)).union(attributeColumns.map(ac => coalesce(overrideDf(ac), etlDf(ac))))
    etlDf.join(overrideDf, childColumns, "left").select(outputColumns:_*)

当需要将结果数据帧写入镶木地板文件时,我收到以下异常:

org.apache.spark.sql.AnalysisException: Attribute name "coalesce(top_customer_fg, top_customer_fg)" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;
    at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:581)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldName(ParquetSchemaConverter.scala:567)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$setSchema$2.apply(ParquetWriteSupport.scala:431)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$setSchema$2.apply(ParquetWriteSupport.scala:431)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$.setSchema(ParquetWriteSupport.scala:431)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.prepareWrite(ParquetFileFormat.scala:115)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:108)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:101)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
    at org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:484)
    at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:520)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198)
    at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:494)
    at com.mycompany.customattributes.ProgramImplementation$StandardProgram.createAttributeFiles(ProgramImplementation.scala:63)

因此即使coalesce函数返回一列,它也会被评估为文字列名。这对我来说似乎很意外。

我在这里犯了语法错误,还是我需要采取不同的方法?

感谢。

0 个答案:

没有答案