Spark.read.csv错误:java.io.IOException:权限被拒绝

时间:2017-02-08 11:42:26

标签: apache-spark apache-spark-sql apache-spark-2.0

我正在使用Spark v2.0并尝试使用以下方法读取csv文件:

spark.read.csv("filepath")

但是得到以下错误:

java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
  at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
  at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
  at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
  at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
  at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)
  at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
  at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
  at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
  at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
  at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
  at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:342)
  ... 48 elided
Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:515)
  ... 71 more
Caused by: java.io.IOException: Permission denied
  at java.io.UnixFileSystem.createFileExclusively(Native Method)
  at java.io.File.createTempFile(File.java:2024)
  at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)
  ... 71 more

我也尝试过使用.format("csv").csv("filepath"),但这也是同样的结果。

2 个答案:

答案 0 :(得分:1)

如果查看异常堆栈跟踪的最后一部分,您会发现此错误并非与“filepath”上的文件没有足够的访问权限。

我在Windows客户端上使用Spark shell时遇到了类似的问题。这是我得到的错误

  at java.io.WinNTFileSystem.createFileExclusively(Native Method)
  at java.io.File.createTempFile(File.java:2024)
  at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)

注意它在堆栈跟踪中是如何表示WinNTFileSystem的(当你把它作为UnixFileSystem时),这让我更仔细地看一下这个堆栈跟踪。我意识到当前用户无权在本地创建临时文件。更具体地说,org.apache.hadoop.hive.ql.session.SessionState尝试在Hive本地临时目录中创建临时文件。如果当前用户没有足够的权限来执行此操作,则会出现此错误。

对我来说,在Windows上,我意识到我必须“以管理员身份运行”用于运行Spark Shell的命令提示符。这对我有用。

对于你,在Unix上,我想要sudo或者更新Hive配置来设置本地临时目录,或者更新现有Hive配置的目录安全设置应该可以解决问题。

答案 1 :(得分:1)

尝试使用此代码可能有帮助

从Csv

读取数据
Dataset<Row> src = sqlContext.read()
        .format("com.databricks.spark.csv")
        .option("header", "true")
        .load("Source_new.csv");`

将数据写入Csv

src.write()
        .format("com.databricks.spark.csv")
        .option("header", "true")
        .save("LowerCaseData.csv");