我是EMR和Spark的新手。我用Java编写了一个简单的spark作业程序,并构建了一个胖子罐。我通过SSH进入EMR集群,并使用“ java -jar mySampleEMRJob.jar”运行了该程序。它按预期运行。
但是,当我使用/ usr / lib / hadoop / bin / hadoop jar mySampleEMRJob.jar运行相同的程序时,出现以下错误:
NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.defaultNumHeapArena()
该错误显示在下面的堆栈跟踪中。
现在我的问题是:
1)为什么JAVA -jar选项在同一EMR群集上成功,但是hadoop jar失败了?
2)如何解决该净额问题。
19/05/31 06:44:02 INFO spark.SparkContext: Running Spark version 2.4.2
19/05/31 06:44:02 INFO spark.SparkContext: Submitted application: SparkJob
19/05/31 06:44:03 INFO spark.SecurityManager: Changing view acls to: hadoop
19/05/31 06:44:03 INFO spark.SecurityManager: Changing modify acls to: hadoop
19/05/31 06:44:03 INFO spark.SecurityManager: Changing view acls groups to:
19/05/31 06:44:03 INFO spark.SecurityManager: Changing modify acls groups to:
19/05/31 06:44:03 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
Exception in thread "main" java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.defaultNumHeapArena()I
at org.apache.spark.network.util.NettyUtils.createPooledByteBufAllocator(NettyUtils.java:113)
at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:106)
at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:99)
at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:71)
at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:461)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:249)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at SparkJob.main(SparkJob.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
答案 0 :(得分:0)
使用AWS Pipeline在EMR集群上运行Spark应用程序时,我遇到了相同的错误。
由于您的代码也使用Spark,因此使用spark-submit
执行jar很重要。
例如
spark-submit --class com.company.acme.Main s3://mybucket/emr/sparkapp.jar
否则,如果直接运行jar,则会使用错误的类路径,该路径不包含依赖jar的正确版本。