ERROR netty.NettyTransport: failed to bind to spark.master/172.28.128.3:0, shutting down Netty transport
15/03/16 04:08:50 WARN util.Utils: Service 'Driver' could not bind on port 0. Attempting port 1.
那^^^是我从奴隶日志中得到的错误。我用spark-submit提交我的工作。这没有任何意义,因为从站能够连接到主站,如web-ui所示。我以为我已经配置了正确的端口,如下图所示,我的配置在所有机器上。
Spark-Env.sh
export SPARK_LOCAL_IP=$(ip addr | grep 'state UP' -A2 | tail -n1 | awk '{print $2}' | cut -f1 -d'/')
export SPARK_MASTER_IP=spark.master
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_WORKER_PORT=9919
火花Defaults.Conf
spark.master spark://spark.master:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://spark.master:8020/spark-log
spark.yarn.submit.file.replication 3
spark.app.name quant
spark.ui.port 4040
spark.driver.port 9929
spark.executor.port 9939
spark.driver.host spark.slave
这在我的slave和master节点上。当我提交作业时,我正在使用bash命令=>
/usr/local/spark/bin/spark-submit --class dev.quant.App --deploy-mode cluster hdfs:///spark/my-app.jar
spark-env.sh和spark-defaults.conf是chmod 775所以它们应该正在运行。
我的主人的日志是:
15/03/16 04:08:51 INFO master.Master: Removing driver: driver-20150316040848-0002
15/03/16 04:08:54 INFO master.Master: akka.tcp://driverClient@spark.master:55303 got disassociated, removing it.
15/03/16 04:08:54 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://driverClient@spark.master:55303] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/03/16 04:08:54 INFO master.Master: akka.tcp://driverClient@spark.master:55303 got disassociated, removing it.
15/03/16 04:08:54 INFO actor.LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40172.28.128.3%3A32995-5#678153583] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
据我所知,不支持在集群模式下启动,这对我来说没有意义,因为独立的spark应该是集群解决方案。所以我也尝试在客户端模式下启动,它给了我ClassNotFoundException:dev.quant.App这对我来说没有任何意义,因为我的jar明确地拥有它+所有依赖关系被打包在一起,如程序集中所示。我一直试图让这个愚蠢的东西设置太久,休息一下真好。最后我安装了scala 2.10.5,如果重要的话我的应用程序打包了2.10.5。
答案 0 :(得分:0)
spark.driver.host看起来很可疑。我认为它应该设置为spark.master而不是spark.slave。或者只是完全删除该参数。