我想以CSV格式保存我的DataFrame。这是一个小数据集,因此我使用coalesce(1)
:
df.coalesce(1).write.mode(SaveMode.Overwrite).csv(outputPath + "/test.csv")
我收到此错误:
Caused by: java.io.IOException: File already exists:s3://test/test.csv/part-00000-c9f8a000-2601-4b83-a6d6-a3f023937fdc-c000.csv
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.create(S3NativeFileSystem.java:617)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:915)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:896)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:793)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.create(EmrFileSystem.java:176)
at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream(CodecStreams.scala:81)
at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStreamWriter(CodecStreams.scala:92)
at org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CSVFileFormat.scala:135)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anon$1.newInstance(CSVFileFormat.scala:77)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:305)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:314)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
但是,我可以将此DataFrame保存为镶木地板文件而不会出现任何错误:
df.write.mode(SaveMode.Overwrite).parquet(outputPath + "/test")
如何解决此问题并以CSV格式保存我的DataFrame?
答案 0 :(得分:0)
您是否注意到您将CSV保存为“outputPath / test.csv”,但将Parquet保存为“outputPath / test”。尝试删除文件扩展名。语法看起来正确。