我已使用Cloudera(版本5.8.2)在运行YARN群集的远程计算机上部署了spark job server(version 0.6.2)。按照here给出的说明进行操作。部署之后,当我尝试启动服务器时,出现以下错误:
线程“main”中的异常java.lang.NoSuchMethodError: akka.util.Helpers $ .ConfigOps(LCOM /类型安全/配置/配置)LCOM /类型安全/配置/配置; at akka.cluster.ClusterSettings。(ClusterSettings.scala:28)at akka.cluster.Cluster。(Cluster.scala:67)at akka.cluster.Cluster $ .createExtension(Cluster.scala:42)at akka.cluster.Cluster $ .createExtension(Cluster.scala:37)at akka.actor.ActorSystemImpl.registerExtension(ActorSystem.scala:654) at akka.actor.ExtensionId $ class.apply(Extension.scala:79)at akka.cluster.Cluster $ .apply(Cluster.scala:37)at akka.cluster.ClusterActorRefProvider.createRemoteWatcher(ClusterActorRefProvider.scala:66) 在 akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:186) 在 akka.cluster.ClusterActorRefProvider.init(ClusterActorRefProvider.scala:58) 在 akka.actor.ActorSystemImpl._start $ lzycompute(ActorSystem.scala:579) 在akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)at akka.actor.ActorSystem $ .apply(ActorSystem.scala:111)at akka.actor.ActorSystem $ .apply(ActorSystem.scala:104)at spark.jobserver.JobServer $ $ .spark $ jobserver $$ JobServer makeSupervisorSystem $ 1(JobServer.scala:128) 在 spark.jobserver.JobServer $$ anonfun $主$ 1.适用(JobServer.scala:130) 在 spark.jobserver.JobServer $$ anonfun $主$ 1.适用(JobServer.scala:130) 在spark.jobserver.JobServer $ .start(JobServer.scala:54)at spark.jobserver.JobServer $ .main(JobServer.scala:130)at spark.jobserver.JobServer.main(JobServer.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.lang.reflect.Method.invoke(Method.java:606)at org.apache.spark.deploy.SparkSubmit $ .ORG $阿帕奇$火花$部署$ SparkSubmit $$ runMain(SparkSubmit.scala:731) 在 org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:181) 在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:206) 在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:121) 在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
运行start_server.sh时服务器中的文件:
的local.conf :
# Template for a Spark Job Server configuration file
# When deployed these settings are loaded when job server starts
#
# Spark Cluster / Job Server configuration
spark {
# spark.master will be passed to each job's JobContext
# master = "local[4]"
# master = "mesos://vm28-hulk-pub:5050"
master = "yarn-client"
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 4
jobserver {
port = 8090
jar-store-rootdir = /tmp/jobserver/jars
context-per-jvm = true
jobdao = spark.jobserver.io.JobFileDAO
filedao {
rootdir = /tmp/spark-job-server/filedao/data
}
# When using chunked transfer encoding with scala Stream job results, this is the size of each chunk
result-chunk-size = 1m
}
# predefined Spark contexts
# contexts {
# my-low-latency-context {
# num-cpu-cores = 1 # Number of cores to allocate. Required.
# memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, 1G, etc.
# }
# # define additional contexts here
# }
# universal context configuration. These settings can be overridden, see README.md
context-settings {
num-cpu-cores = 2 # Number of cores to allocate. Required.
memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, #1G, etc.
# in case spark distribution should be accessed from HDFS (as opposed to being installed on every mesos slave)
# spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz"
# uris of jars to be loaded into the classpath for this context. Uris is a string list, or a string separated by commas ','
# dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"]
# If you wish to pass any settings directly to the sparkConf as-is, add them here in passthrough,
# such as hadoop connection settings that don't use the "spark." prefix
passthrough {
#es.nodes = "192.1.1.1"
}
}
# This needs to match SPARK_HOME for cluster SparkContexts to be created successfully
home = "/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/spark"
}
# Note that you can use this file to define settings not only for job server,
# but for your Spark jobs as well. Spark job configuration merges with this configuration file as defaults.
akka {
remote.netty.tcp {
# This controls the maximum message size, including job results, that can be sent
# maximum-frame-size = 10 MiB
}
}
start_server.sh :
#!/bin/bash
# Script to start the job server
# Extra arguments will be spark-submit options, for example
# ./server_start.sh --jars cassandra-spark-connector.jar
#
# Environment vars (note settings.sh overrides):
# JOBSERVER_MEMORY - defaults to 1G, the amount of memory (eg 512m, 2G) to give to job server
# JOBSERVER_CONFIG - alternate configuration file to use
# JOBSERVER_FG - launches job server in foreground; defaults to forking in background
echo 'Starting job server...'
set -e
get_abs_script_path() {
pushd . >/dev/null
cd "$(dirname "$0")"
appdir=$(pwd)
popd >/dev/null
}
get_abs_script_path
. $appdir/setenv.sh
GC_OPTS="-XX:+UseConcMarkSweepGC
-verbose:gc -XX:+PrintGCTimeStamps -Xloggc:$appdir/gc.out
-XX:MaxPermSize=512m
-XX:+CMSClassUnloadingEnabled "
# To truly enable JMX in AWS and other containerized environments, also need to set
# -Djava.rmi.server.hostname equal to the hostname in that environment. This is specific
# depending on AWS vs GCE etc.
JAVA_OPTS="-XX:MaxDirectMemorySize=$MAX_DIRECT_MEMORY \
-XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true"
# -Dcom.sun.management.jmxremote.port=9999 \
# -Dcom.sun.management.jmxremote.rmi.port=9999 \
# -Dcom.sun.management.jmxremote.authenticate=false \
# -Dcom.sun.management.jmxremote.ssl=false"
MAIN="spark.jobserver.JobServer"
PIDFILE=$appdir/spark-jobserver.pid
if [ -f "$PIDFILE" ] && kill -0 $(cat "$PIDFILE"); then
echo 'Job server is already running'
exit 1
fi
# Code added
echo App dir: $appdir
echo Conf file_path: $conffile
echo Spark home: $SPARK_HOME
echo Main class path: $MAIN
cmd='$SPARK_HOME/bin/spark-submit --class $MAIN --driver-memory $JOBSERVER_MEMORY
--conf "spark.executor.extraJavaOptions=$LOGGING_OPTS"
--driver-java-options "$GC_OPTS $JAVA_OPTS $LOGGING_OPTS $CONFIG_OVERRIDES"
$@ $appdir/spark-job-server.jar $conffile'
# Code added
if [ -z "$JOBSERVER_FG" ]; then
eval $cmd &
echo $! > $PIDFILE
else
eval $cmd
fi
部署前本地计算机中的文件:
的local.conf :
# Template for a Spark Job Server configuration file
# When deployed these settings are loaded when job server starts
#
# Spark Cluster / Job Server configuration
spark {
# spark.master will be passed to each job's JobContext
# master = "local[4]"
# master = "mesos://vm28-hulk-pub:5050"
master = "yarn-client"
# Default # of CPUs for jobs to use for Spark standalone cluster
job-number-cpus = 4
jobserver {
port = 8090
jar-store-rootdir = /tmp/jobserver/jars
context-per-jvm = false
jobdao = spark.jobserver.io.JobFileDAO
filedao {
rootdir = /tmp/spark-job-server/filedao/data
}
# When using chunked transfer encoding with scala Stream job results, this is the size of each chunk
result-chunk-size = 1m
}
# predefined Spark contexts
# contexts {
# my-low-latency-context {
# num-cpu-cores = 1 # Number of cores to allocate. Required.
# memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, 1G, etc.
# }
# # define additional contexts here
# }
# universal context configuration. These settings can be overridden, see README.md
context-settings {
num-cpu-cores = 2 # Number of cores to allocate. Required.
memory-per-node = 512m # Executor memory per node, -Xmx style eg 512m, #1G, etc.
# in case spark distribution should be accessed from HDFS (as opposed to being installed on every mesos slave)
# spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz"
# uris of jars to be loaded into the classpath for this context. Uris is a string list, or a string separated by commas ','
# dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"]
# If you wish to pass any settings directly to the sparkConf as-is, add them here in passthrough,
# such as hadoop connection settings that don't use the "spark." prefix
passthrough {
#es.nodes = "192.1.1.1"
}
}
# This needs to match SPARK_HOME for cluster SparkContexts to be created successfully
home = "/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/spark"
}
# Note that you can use this file to define settings not only for job server,
# but for your Spark jobs as well. Spark job configuration merges with this configuration file as defaults.
akka {
remote.netty.tcp {
# This controls the maximum message size, including job results, that can be sent
# maximum-frame-size = 10 MiB
}
}
local.sh :
# Environment and deploy file
# For use with bin/server_deploy, bin/server_package etc.
DEPLOY_HOSTS="xx.xx.xxx.xxx"
APP_USER=ubuntu
APP_GROUP=ubuntu
# optional SSH Key to login to deploy server
SSH_KEY=/home/xx/xx/xx.pem
INSTALL_DIR=/home/ubuntu/spark/deployed-job-server
LOG_DIR=/var/log/deployed-job-server
PIDFILE=spark-jobserver.pid
JOBSERVER_MEMORY=1G
SPARK_VERSION=1.6.0
MAX_DIRECT_MEMORY=512M
SPARK_HOME=/opt/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/spark
SPARK_CONF_DIR=$SPARK_HOME/conf
# Only needed for Mesos deploys
SPARK_EXECUTOR_URI=/home/spark/spark-1.6.0.tar.gz
# Only needed for YARN running outside of the cluster
# You will need to COPY these files from your cluster to the remote machine
# Normally these are kept on the cluster in /etc/hadoop/conf
# YARN_CONF_DIR=/pathToRemoteConf/conf
# HADOOP_CONF_DIR=/pathToRemoteConf/conf
#
# Also optional: extra JVM args for spark-submit
# export SPARK_SUBMIT_OPTS+="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5433"
SCALA_VERSION=2.10.4 # or 2.11.6
这个错误背后的原因是什么,以及如何消除它?