Multiple IP addresses and Host Names used by Spark Driver and Master

时间:2015-07-31 19:39:37

标签: apache-spark

Spark Master listens on several ports. Unfortunately the IP address / hostname scheme used differs among them - and it often happens that connections failed.

Then we are left to wonder: how to fix the connection problems: Spark decides on its own how to translate among:

  • hostname
  • hostname.local (on mac os/x)
  • hostname.domain
  • localhost
  • localhost.localdomain
  • 127.0.0.1
  • external IP address
  • internal ip address (on AWS)

The important consideration: some of the networking clients/connections need an exact string match to successfully contact the master. So in that case 127.0.0.1 is not the same as hostname. I have seen in cases where hostname works and hostname.local does not: that one is a Mac-centric problem. But .. then the former stops working - and I lack the tools to troubleshoot why.

The --master provides opportunities for confusion on the Linux when you have an internal and external IP address.

Below is an example on my Mac. I see other patterns on AWS and yet other ones on standalone clusters. It is all perplexing and time consuming since it is not clearly documented either:

  • where the mappings occur
  • how to achieve a consistent master address string across:
    • master
    • master web ui
    • akka address for master

Below we see output when the --master option were provided to spark-submit.

--master spark://mellyrn:7077 

Notice the variety of ip addresses

http://25.x.x.x:4040
akka.tcp://sparkMaster@mellyrn:7077
mellyrn/127.0.0.1:707

Here is the output on MAC:

15/07/31 12:21:34 INFO SparkEnv: Registering OutputCommitCoordinator
15/07/31 12:21:34 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/07/31 12:21:34 INFO SparkUI: Started SparkUI at http://25.101.19.24:4040
15/07/31 12:21:34 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@mellyrn:7077/user/Master...
15/07/31 12:21:35 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster@mellyrn:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@mellyrn:7077
15/07/31 12:21:35 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@mellyrn:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: mellyrn/127.0.0.1:7077
15/07/31 12:21:54 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@mellyrn:7077/user/Master...
15/07/31 12:21:54 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster@mellyrn:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@mellyrn:7077
15/07/31 12:21:54 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@mellyrn:7077]. Address is now gated for 5000 

On Linux the spark connection with --master option does work (though .setMaster() does not reliably). Yet even on linux there is a variety of master/driver strings generated:

2 个答案:

答案 0 :(得分:2)

发现问题:Spark绑定到不同的本地接口。我在25.X.X.X地址上有一个VPN客户端 - 但是主机名ping到10.X.X. 这可能是火花中的错误。我将调查是否已经提交了JIRA。

答案 1 :(得分:-1)

我最近遇到了同样的问题。它始终提示为 当我使用“收集”时,“例外:无法打开套接字”。

但是当我连接vpn时,它工作正常。我试图统一节点的所有名称