我正在使用Spark v2.0并尝试使用以下方法读取csv文件:
spark.read.csv("filepath")
但是得到以下错误:
java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)
at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:342)
... 48 elided
Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:515)
... 71 more
Caused by: java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:2024)
at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)
... 71 more
我也尝试过使用.format("csv").csv("filepath")
,但这也是同样的结果。
答案 0 :(得分:1)
如果查看异常堆栈跟踪的最后一部分,您会发现此错误并非与“filepath”上的文件没有足够的访问权限。
我在Windows客户端上使用Spark shell时遇到了类似的问题。这是我得到的错误
at java.io.WinNTFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:2024)
at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)
注意它在堆栈跟踪中是如何表示WinNTFileSystem的(当你把它作为UnixFileSystem时),这让我更仔细地看一下这个堆栈跟踪。我意识到当前用户无权在本地创建临时文件。更具体地说,org.apache.hadoop.hive.ql.session.SessionState
尝试在Hive本地临时目录中创建临时文件。如果当前用户没有足够的权限来执行此操作,则会出现此错误。
对我来说,在Windows上,我意识到我必须“以管理员身份运行”用于运行Spark Shell的命令提示符。这对我有用。
对于你,在Unix上,我想要sudo
或者更新Hive配置来设置本地临时目录,或者更新现有Hive配置的目录安全设置应该可以解决问题。
答案 1 :(得分:1)
尝试使用此代码可能有帮助
从Csv
读取数据Dataset<Row> src = sqlContext.read()
.format("com.databricks.spark.csv")
.option("header", "true")
.load("Source_new.csv");`
将数据写入Csv
src.write()
.format("com.databricks.spark.csv")
.option("header", "true")
.save("LowerCaseData.csv");