从S3A加载数据时出现Spark 3.0.0错误

时间:2020-07-21 16:01:12

标签: java apache-spark netty

我尝试将spark 3.0.0-preview2升级到spark 3.0.0,运行'''spark.read.load(“ s3a:...”)'''和代码时遇到以下错误可以在以前的版本中使用。

20/07/21 08:57:06 INFO spark.SecurityManager: Changing view acls to: ec2-user
20/07/21 08:57:06 INFO spark.SecurityManager: Changing modify acls to: ec2-user
20/07/21 08:57:06 INFO spark.SecurityManager: Changing view acls groups to: 
20/07/21 08:57:06 INFO spark.SecurityManager: Changing modify acls groups to: 
20/07/21 08:57:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ec2-user); groups with view permissions: Set(); users  with modify permissions: Set(ec2-user); groups with modify permissions: Set()
20/07/21 08:57:06 INFO client.TransportClientFactory: Successfully created connection to Master/10.0.179.117:39527 after 68 ms (0 ms spent in bootstraps)
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:283)
    at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend$.main(YarnCoarseGrainedExecutorBackend.scala:81)
    at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main(YarnCoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:302)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103)
    at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$1(CoarseGrainedExecutorBackend.scala:311)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
    ... 4 more
Caused by: org.apache.spark.SparkException: Unsupported message RpcMessage(10.0.179.117:49308,RetrieveSparkAppConfig(0),org.apache.spark.rpc.netty.RemoteNettyRpcCallContext@516aceb5) from 10.0.179.117:49308
    at org.apache.spark.rpc.netty.Inbox.$anonfun$process$2(Inbox.scala:104)
    at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receiveAndReply$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:205)
    at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
    at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
    at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

我目前使用netty 4.1.47和aws-java 1.7.4。 很抱歉,由于我不知道错误的来源,因此无法为您提供更多信息。

编辑:

Fail to execute line 2: data = spark.read.load("s3a://[some parquet file]")
Traceback (most recent call last):
  File "/tmp/1595350396558-0/zeppelin_python.py", line 153, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 2, in <module>
  File "/home/ec2-user/spark/python/pyspark/sql/readwriter.py", line 178, in load
    return self._df(self._jreader.load(path))
  File "/home/ec2-user/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/ec2-user/spark/python/pyspark/sql/utils.py", line 137, in deco
    raise_from(converted)
  File "<string>", line 3, in raise_from
pyspark.sql.utils.AnalysisException: <unprintable AnalysisException object>

0 个答案:

没有答案