我对scala和spark都是新手。我有一个很愚蠢的问题。我有一个从Elasticsearch创建的数据框。我正在尝试以实木复合地板格式编写该s3。下面是我的代码块和我看到的错误。一个好的撒玛利亚人可以请我为这个沉默寡言吗?
val dfSchema = dataFrame.schema.json
// log.info(dfSchema)
dataFrame
.withColumn("lastFound", functions.date_add(dataFrame.col("last_found"), -457))
.write
.partitionBy("lastFound")
.mode("append")
.format("parquet")
.option("schema", dfSchema)
.save("/tmp/elasticsearch/")
org.apache.spark.sql.AnalysisException:
Datasource does not support writing empty or nested empty schemas.
Please make sure the data schema has at least one or more column(s).
;
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$validateSchema(DataSource.scala:733)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:523)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
答案 0 :(得分:1)
以镶木地板格式编写数据时,无需放置架构。
使用附加模式时,假设您已经在精确路径中存储了数据,并且想要添加新数据。如果要覆盖,可以放置“覆盖”而不是“追加”,如果路径是新路径,则不需要放置任何内容。
当您写入s3时,路径通常应该像这样“ s3:// bucket / the folder”
你可以试试吗:
dataFrame
.withColumn("lastFound", functions.date_add(dataFrame.col("last_found"), -457))
.write
.partitionBy("lastFound")
.mode("append")
.parquet("/tmp/elasticsearch/")