Question

将我的word2vec模型写入S3，如下所示：

model.save(sc, "s3://output/folder")

我通常没有问题，因此没有AWS凭据问题，但我随机收到以下错误。

17/01/30 20:35:21 WARN ConfigurationUtils：无法创建临时目录适当的权限：/ mnt2 / s3 java.nio.file.AccessDeniedException：/ mnt2 at sun.nio.fs.UnixException.translateToIOException（UnixException.java:84） at sun.nio.fs.UnixException.rethrowAsIOException（UnixException.java:102） at sun.nio.fs.UnixException.rethrowAsIOException（UnixException.java:107） at sun.nio.fs.UnixFileSystemProvider.createDirectory（UnixFileSystemProvider.java:384）在java.nio.file.Files.createDirectory（Files.java:674） at java.nio.file.Files.createAndCheckIsDirectory（Files.java:781）在java.nio.file.Files.createDirectories（Files.java:767）在com.amazon.ws.emr.hadoop.fs.util.ConfigurationUtils.getTestedTempPaths（ConfigurationUtils.java:216）在com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.initialize（S3NativeFileSystem.java:447）在com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize（EmrFileSystem.java:111）在org.apache.hadoop.fs.FileSystem.createFileSystem（FileSystem.java:2717）在org.apache.hadoop.fs.FileSystem.access $ 200（FileSystem.java:93） at org.apache.hadoop.fs.FileSystem $ Cache.getInternal（FileSystem.java:2751）在org.apache.hadoop.fs.FileSystem $ Cache.get（FileSystem.java:2733）在org.apache.hadoop.fs.FileSystem.get（FileSystem.java:377）在org.apache.hadoop.fs.Path.getFileSystem（Path.java:295）在org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter。（FileOutputCommitter.java:113）在org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter。（FileOutputCommitter.java:88）在org.apache.parquet.hadoop.ParquetOutputCommitter。（ParquetOutputCommitter.java:41）在org.apache.parquet.hadoop.ParquetOutputFormat.getOutputCommitter（ParquetOutputFormat.java:339）

尝试了各种群集，但没有设法弄明白。这是pyspark的已知问题吗？

Answer 1

这可能与SPARK-19247有关。截至今天（Spark 2.1.0），ML编写器将所有数据重新分配到单个分区，并且在大型模型的情况下可能导致失败。如果这确实是问题的根源，您可以尝试使用code from the corresponding PR手动修补您的发行版。

Pyspark随机无法写入tos3

1 个答案: