尝试连接到远程Spark群集时,连接被拒绝:spark.example.com/xxx.xxx.xxx.xxx:7077

时间:2018-08-31 04:41:48

标签: apache-spark amazon-ec2 pyspark aws-security-group connection-refused

我在EC2机器上安装了Spark Standalone Cluster。集群总共包含1个主节点和2个工作节点。当我尝试借助PySpark Shell将本地计算机上的Spark作业提交给远程主服务器时,出现连接被拒绝的错误。

在我的本地计算机上尝试连接到远程Spark Master(EC2实例):

pyspark --master spark://spark.example.com:7077

在本地计算机上运行上述命令时,出现以下错误:

Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: test.example.com/52.66.70.6:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
Caused by: java.net.ConnectException: Connection refused
    ... 11 more
2018-08-31 08:58:09 ERROR StandaloneSchedulerBackend:70 - Application has been killed. Reason: All masters are unresponsive! Giving up.
2018-08-31 08:58:09 WARN  StandaloneSchedulerBackend:66 - Application ID is not initialized yet.
2018-08-31 08:58:09 WARN  StandaloneAppClient$ClientEndpoint:66 - Drop UnregisterApplication(null) because has not yet connected to master
2018-08-31 08:58:09 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
2018-08-31 08:58:10 ERROR SparkContext:91 - Error initializing SparkContext.

当我通过登录到我的Spark节点之一运行同一命令时,它已成功建立连接。

/ etc / hosts文件:

127.0.0.1 localhost
127.0.0.1 spark.example.com #Changing this to floating/Public IP throws "Cannot Bind to port 7077" error
127.0.0.1 slave1

spark-env.sh

MASTER_HOST=spark.example.com

EC2入站安全组配置为允许“所有流量”从“ Internet上的任何地方”到“所有端口”

下面是lsof登录我的Spark Master节点(EC2实例)

lsof -i :7077
COMMAND   PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    20671 ubuntu  237u  IPv6  79763      0t0  TCP localhost:7077 (LISTEN)
java    20671 ubuntu  249u  IPv6  80993      0t0  TCP localhost:7077->localhost:42553 (ESTABLISHED)
java    20671 ubuntu  250u  IPv6  80994      0t0  TCP localhost:7077->localhost:42554 (ESTABLISHED)
java    20910 ubuntu  252u  IPv6  80992      0t0  TCP localhost:42554->localhost:7077 (ESTABLISHED)
java    20912 ubuntu  251u  IPv6  80991      0t0  TCP localhost:42553->localhost:7077 (ESTABLISHED)

1 个答案:

答案 0 :(得分:0)

您的问题是您的Spark提交过程无法与Spark主数据库对话。您没有提到您的安全组配置,我假设这是问题所在。

在EC2上为Spark配置入站安全规则时,我打开了所有端口(TCP和UDP),但将源设置为系统的公共IP地址。这意味着只有我网络上的计算机可以访问系统。

如果要加强安全性,请启用以下端口:

8080-Spark UI

4040-Spark Work用户界面

8088-Sparklr用户界面

7077-Spark提交界面

注意:您需要启用安全组中主站和从站之间的所有端口。他们需要能够彼此公开交流。