我想从S3目录中读取数据集,进行一些更新并将其覆盖到同一文件中。我要做的是:
dataSetWriter.writeDf(
finalDataFrame,
destinationPath,
destinationFormat,
SaveMode.Overwrite,
destinationCompression)
但是我的工作失败,并显示以下错误消息:
java.io.FileNotFoundException: No such file or directory 's3://processed/fullTableUpdated.parquet/part-00503-2b642173-540d-4c7a-a29a-7d0ae598ea4a-c000.parquet'
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
为什么会这样? “覆盖”模式有什么我想念的吗?
谢谢