我有一台带有3台机器的DSE集群:1,2和3。
当我向主人提交申请时,如果我理解得很好,就会发生这样的事情:
所以我们在这个群集中得到了这样的配置:
当Spark为驱动程序选择worker 1(master)时,一切运行正常。 但是当Spark决定将worker 2(slave)或worker 3(slave)分配给驱动程序时,它会尝试绑定master的ip并且每次都失败:
INFO 16:20:45 Changing view acls to: cassandra
INFO 16:20:45 Changing modify acls to: cassandra
INFO 16:20:45 SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cassandra); users with modify permissions: Set(cassandra)
INFO 16:20:45 Slf4jLogger started
ERROR 16:20:46 failed to bind to /10.1.1.1:0, shutting down Netty transport
WARN 16:20:46 Service 'Driver' could not bind on port 0. Attempting port 1.
INFO 16:20:46 Slf4jLogger started
ERROR 16:20:46 failed to bind to /10.1.1.1:0, shutting down Netty transport
WARN 16:20:46 Service 'Driver' could not bind on port 0. Attempting port 1.
每个节点的配置非常简单:
export SPARK_LOCAL_IP="10.1.1.1" // or .2 or .3
export SPARK_PUBLIC_DNS="xx.xx.xx.xx"
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=7080
export SPARK_DRIVER_HOST="10.1.1.1" // or .2 or .3
export SPARK_WORKER_INSTANCES=1
export SPARK_DRIVER_MEMORY="10G"
我尝试在spark-defaults.conf中设置spark.driver.port,但它没有效果。
这是提交电话:
/usr/bin/dse spark-submit --properties-file production.conf --master spark://10.1.1.1:7077 --deploy-mode cluster --class "com.company.SignalIO" aggregation.jar 2015-6-1-00:00:00 2015-6-2-00:00:00 signal_table
有什么想法吗?