Question

我在使用需要2个罐子的应用程序时使用hdinsight spark集群时遇到了问题，

第一个是JNR（com.github.jnr：jnr-constants：0.9.0）
另一个是JNA（net.java.dev.jna：jna：4.1.0），这是我使用的jruby所必需的。

我遇到的问题是每当我运行我的应用程序时都会出现此错误：

[Error] Exception java.lang.NoSuchMethodError : jnr.constants.platform.OpenFlags.defined()Z

如果我删除了调用jnr

的代码，我对jna也有同样的问题

my.process.check.run.checkRun$.main(checkRun.scala:219): [Error] Exception java.lang.NoSuchMethodError : com.sun.jna.Platform.is64Bit()Z

（is64Bit()Z功能在jna v3.5.1上不可用）

我检查了我只有这个的工人：

myClusterUser@wn0-Test:/$ find . -name '*jna*.jar' 2>/dev/null
./usr/lib/hdinsight-scpnet/scp/jvm/jna-3.5.1.jar
./usr/hdp/2.4.2.0-258/storm/extlib/jna-3.5.1.jar
myClusterUser@wn0-Test:/$ find . -name '*jnr*.jar' 2>/dev/null
myClusterUser@wn0-Test:/$

在头上我有这个：

myClusterUser@hn0-Test:/$ find . -name '*jna*.jar' 2>/dev/null
./usr/lib/hdinsight-scpnet/scp/jvm/jna-3.5.1.jar
./usr/hdp/2.4.2.0-258/storm/extlib/jna-3.5.1.jar
myClusterUser@hn0-Test:/$ find . -name '*jnr*.jar' 2>/dev/null
myClusterUser@hn0-Test:/$

我首先尝试用mvn组装制作一个“胖罐子” 所需的一切都包括在内，包括jna的优秀版本（4.1.0）和 jnr，我现在有一个129MB的罐子，但是得到了同样的错误。

我尝试使用--packages option

将它们添加到我的spark提交中

spark-submit \
--verbose \
--packages net.java.dev.jna:jna:4.1.0,com.github.jnr:jnr- constants:0.9.0,org.jruby:jruby:9.0.1.0,com.databricks:spark-csv_2.10:1.4.0 \
--conf spark.executor.extraClassPath=./ \
--conf spark.driver.maxResultSize=2g \
--conf spark.executor.memory=1500m \
--conf spark.yarn.executor.memoryOverhead=500 \
--conf spark.executor.instances=2 \
--conf spark.sql.shuffle.partitions=4 \
--conf 'spark.executor.extraJavaOptions=-XX:PermSize=512M -XX:MaxPermSize=512M' \
--conf 'spark.driver.extraJavaOptions=-XX:PermSize=512M -XX:MaxPermSize=512M' \
--deploy-mode cluster \
--master yarn-cluster \
--class my.process.check.run.checkRun \
wasb:///checkRun/my-checkRun-1.0.6-SNAPSHOT-jar-with-dependencies.jar \
--nostdin \
--nodb \
--LOG_LEVEL 0

permsize选项用于避免内存不足问题，因为hdinsight使用的是java 7而不是java 8。

当我这样做时，我可以看到每个工作人员都在yarn / local / filecache上复制 my-checkRun-1.0.6-SNAPSHOT-jar-with-dependencies.jar

myClusterUser@wn3-Test:/$ find . -name '*checkRun*.jar' 2>/dev/null
./mnt/resource/hadoop/yarn/local/filecache/10/my-checkRun-1.0.6-SNAPSHOT-jar-with-dependencies.jar

并且这个文件夹除了这个jar之外别无其他。

我还看到spark提交检索我在--packages选项上指定的jar的版本，将它们存储在本地存储库m2

然后将它们放在一个临时的WSAB（hdfs）文件夹旁边，并附带一个spark conf存档，
temporary storage during run
在这个档案中，我有 spark_conf .properties

#Spark configuration.
#Mon Jul 11 13:18:01 UTC 2016
spark.executor.memory=1500m
spark.yarn.submit.file.replication=3
spark.yarn.jar=local\:///usr/hdp/current/spark-client/lib/spark-assembly.jar
spark.yarn.executor.memoryOverhead=500
spark.yarn.driver.memoryOverhead=384
spark.history.kerberos.keytab=none
spark.submit.deployMode=cluster
spark.yarn.secondary.jars=net.java.dev.jna_jna-4.1.0.jar,com.github.jnr_jnr-constants-0.9.0.jar
spark.yarn.scheduler.heartbeat.interval-ms=5000
spark.yarn.preserve.staging.files=false
spark.eventLog.enabled=true
spark.executor.extraClassPath=./
spark.yarn.queue=default
spark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider
spark.history.ui.port=18080
spark.yarn.historyServer.address=hn0-testr.su4ft5rezscepaqpicvo04xrkb.fx.internal.cloudapp.net\:18080
spark.master=yarn-cluster
spark.yarn.containerLauncherMaxThreads=25
spark.executor.cores=2
spark.yarn.max.executor.failures=3
spark.yarn.services=
spark.history.fs.logDirectory=wasb\:///hdp/spark-events
spark.sql.shuffle.partitions=4
spark.executor.extraJavaOptions=-XX\:PermSize\=512M -XX\:MaxPermSize\=512M
spark.executor.instances=2
spark.app.name=my.process.check.run.checkRun
spark.driver.maxResultSize=2g
spark.history.kerberos.principal=none
spark.driver.extraJavaOptions=-XX\:PermSize\=512M -XX\:MaxPermSize\=512M
spark.eventLog.dir=wasb\:///hdp/spark-events

如您所见，我在spark.yarn.secondary.jars参数上列出了我的其他广告。

运行后，我可以在头节点上找到更多的jna和jnr（工作节点上没有任何变化）

myClusterUser@hn0-Test:/$ find . -name '*jnr*.jar' 2>/dev/null
./home/myClusterUser/.ivy2/cache/com.github.jnr/jnr-netdb/jars/jnr-netdb-1.1.4.jar
./home/myClusterUser/.ivy2/cache/com.github.jnr/jnr-posix/jars/jnr-posix-3.0.15.jar
./home/myClusterUser/.ivy2/cache/com.github.jnr/jnr-x86asm/jars/jnr-x86asm-1.0.2.jar
./home/myClusterUser/.ivy2/cache/com.github.jnr/jnr-enxio/jars/jnr-enxio-0.9.jar
./home/myClusterUser/.ivy2/cache/com.github.jnr/jnr-unixsocket/jars/jnr-unixsocket-0.8.jar
./home/myClusterUser/.ivy2/cache/com.github.jnr/jnr-constants/jars/jnr-constants-0.9.0.jar
./home/myClusterUser/.ivy2/jars/com.github.jnr_jffi-1.2.9.jar
./home/myClusterUser/.ivy2/jars/com.github.jnr_jnr-constants-0.9.0.jar
./home/myClusterUser/.ivy2/jars/com.github.jnr_jnr-enxio-0.9.jar
./home/myClusterUser/.ivy2/jars/com.github.jnr_jnr-x86asm-1.0.2.jar
./home/myClusterUser/.ivy2/jars/com.github.jnr_jnr-netdb-1.1.4.jar
./home/myClusterUser/.ivy2/jars/com.github.jnr_jnr-posix-3.0.15.jar
./home/myClusterUser/.ivy2/jars/com.github.jnr_jnr-unixsocket-0.8.jar

myClusterUser@hn0-Test:/$ find . -name '*jna*.jar' 2>/dev/null
./usr/lib/hdinsight-scpnet/scp/jvm/jna-3.5.1.jar
./usr/hdp/2.4.2.0-258/storm/extlib/jna-3.5.1.jar
./home/myClusterUser/.ivy2/cache/net.java.dev.jna/jna/jars/jna-4.1.0.jar
./home/myClusterUser/.ivy2/jars/net.java.dev.jna_jna-4.1.0.jar

我尝试删除--packages选项中包含的所有jar，以确保与之没有冲突。

有没有人知道我应该怎么做才能使用我提供的JNA和JNR罐子来运行？

azure-hdinsight上的spark-submit yarn-cluster无法为我的应用提供额外的罐子（如JNR）

0 个答案: