我在EC2机器上安装了Spark Standalone Cluster。集群总共包含1个主节点和2个工作节点。当我尝试借助PySpark Shell将本地计算机上的Spark作业提交给远程主服务器时,出现连接被拒绝的错误。
在我的本地计算机上尝试连接到远程Spark Master(EC2实例):
pyspark --master spark://spark.example.com:7077
在本地计算机上运行上述命令时,出现以下错误:
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: test.example.com/52.66.70.6:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
Caused by: java.net.ConnectException: Connection refused
... 11 more
2018-08-31 08:58:09 ERROR StandaloneSchedulerBackend:70 - Application has been killed. Reason: All masters are unresponsive! Giving up.
2018-08-31 08:58:09 WARN StandaloneSchedulerBackend:66 - Application ID is not initialized yet.
2018-08-31 08:58:09 WARN StandaloneAppClient$ClientEndpoint:66 - Drop UnregisterApplication(null) because has not yet connected to master
2018-08-31 08:58:09 WARN MetricsSystem:66 - Stopping a MetricsSystem that is not running
2018-08-31 08:58:10 ERROR SparkContext:91 - Error initializing SparkContext.
当我通过登录到我的Spark节点之一运行同一命令时,它已成功建立连接。
/ etc / hosts文件:
127.0.0.1 localhost
127.0.0.1 spark.example.com #Changing this to floating/Public IP throws "Cannot Bind to port 7077" error
127.0.0.1 slave1
spark-env.sh
MASTER_HOST=spark.example.com
EC2入站安全组配置为允许“所有流量”从“ Internet上的任何地方”到“所有端口”
下面是lsof
登录我的Spark Master节点(EC2实例)
lsof -i :7077
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 20671 ubuntu 237u IPv6 79763 0t0 TCP localhost:7077 (LISTEN)
java 20671 ubuntu 249u IPv6 80993 0t0 TCP localhost:7077->localhost:42553 (ESTABLISHED)
java 20671 ubuntu 250u IPv6 80994 0t0 TCP localhost:7077->localhost:42554 (ESTABLISHED)
java 20910 ubuntu 252u IPv6 80992 0t0 TCP localhost:42554->localhost:7077 (ESTABLISHED)
java 20912 ubuntu 251u IPv6 80991 0t0 TCP localhost:42553->localhost:7077 (ESTABLISHED)
答案 0 :(得分:0)
您的问题是您的Spark提交过程无法与Spark主数据库对话。您没有提到您的安全组配置,我假设这是问题所在。
在EC2上为Spark配置入站安全规则时,我打开了所有端口(TCP和UDP),但将源设置为系统的公共IP地址。这意味着只有我网络上的计算机可以访问系统。
如果要加强安全性,请启用以下端口:
8080-Spark UI
4040-Spark Work用户界面
8088-Sparklr用户界面
7077-Spark提交界面
注意:您需要启用安全组中主站和从站之间的所有端口。他们需要能够彼此公开交流。