使用空或嵌套空模式将数据帧写入镶木地板文件失败

时间:2019-08-25 09:19:20

标签: scala apache-spark amazon-s3 apache-spark-sql parquet

我对scala和spark都是新手。我有一个很愚蠢的问题。我有一个从Elasticsearch创建的数据框。我正在尝试以实木复合地板格式编写该s3。下面是我的代码块和我看到的错误。一个好的撒玛利亚人可以请我为这个沉默寡言吗?

      val dfSchema = dataFrame.schema.json
//      log.info(dfSchema)
      dataFrame
        .withColumn("lastFound", functions.date_add(dataFrame.col("last_found"), -457))
        .write
        .partitionBy("lastFound")
        .mode("append")
        .format("parquet")
        .option("schema", dfSchema)
        .save("/tmp/elasticsearch/")
org.apache.spark.sql.AnalysisException: 
Datasource does not support writing empty or nested empty schemas.
Please make sure the data schema has at least one or more column(s).
         ;
    at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$validateSchema(DataSource.scala:733)
    at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:523)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)

1 个答案:

答案 0 :(得分:1)

以镶木地板格式编写数据时,无需放置架构。

使用附加模式时,假设您已经在精确路径中存储了数据,并且想要添加新数据。如果要覆盖,可以放置“覆盖”而不是“追加”,如果路径是新路径,则不需要放置任何内容。

当您写入s3时,路径通常应该像这样“ s3:// bucket / the folder”

你可以试试吗:

 dataFrame
    .withColumn("lastFound", functions.date_add(dataFrame.col("last_found"), -457))
    .write
    .partitionBy("lastFound")
    .mode("append")
    .parquet("/tmp/elasticsearch/")