虽然EC2管理控制台显示正在运行的集群(1个主服务器,2个从服务器),但我无法访问端口8080上的Spark用户界面(UI)。启动命令运行我得到了相同的RSYNC错误,如下面的cluster start命令所示。
至少有人能解释一下RSYNC正在尝试做什么吗?我已经阅读了一些相关的帖子,但人们似乎非常了解RSYNC,对于像我这样的无知者,解释并不明显: - )
ubuntu@eu-west:~/spark-1.5.2/ec2$ ./spark-ec2 --key-pair=westkey --identity-file=/home/ubuntu/westkey.pem --region=eu-west-1 start my-spark-cluster
Searching for existing cluster my-spark-cluster in region eu-west-1...
Found 1 master, 2 slaves.
Starting slaves...
Starting master...
Waiting for cluster to enter 'ssh-ready' state..........
Cluster is now in 'ssh-ready' state. Waited 259 seconds.
Cloning spark-ec2 scripts from https://github.com/amplab/spark-ec2/tree/branch-1.5 on master...
Warning: Permanently added 'ec2-54-171-121-28.eu-west-1.compute.amazonaws.com,172.31.16.35' (ECDSA) to the list of known hosts.
Please login as the user "ubuntu" rather than the user "root".
Connection to ec2-54-171-121-28.eu-west-1.compute.amazonaws.com closed.
Deploying files to master...
Warning: Permanently added 'ec2-54-171-121-28.eu-west-1.compute.amazonaws.com,172.31.16.35' (ECDSA) to the list of known hosts.
protocol version mismatch -- is your shell clean?
(see the rsync man page for an explanation)
rsync error: protocol incompatibility (code 2) at compat.c(174) [sender=3.1.0]
Traceback (most recent call last):
File "./spark_ec2.py", line 1517, in <module>
main()
File "./spark_ec2.py", line 1509, in main
real_main()
File "./spark_ec2.py", line 1500, in real_main
setup_cluster(conn, master_nodes, slave_nodes, opts, False)
File "./spark_ec2.py", line 836, in setup_cluster
modules=modules
File "./spark_ec2.py", line 1111, in deploy_files
subprocess.check_call(command)
File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['rsync', '-rv', '-e', 'ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /home/ubuntu/westkey.pem', '/tmp/tmpJduy3a/', u'root@ec2-54-171-121-28.eu-west-1.compute.amazonaws.com:/']' returned non-zero exit status 2
答案 0 :(得分:2)
正在尝试执行命令
rsync -rv -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /home/ubuntu/westkey.pem" /tmp/tmpJduy3a/ root@ec2-54-171-121-28.eu-west-1.compute.amazonaws.com:/
这意味着它会尝试将目录树从“源”机器的/tmp/tmpJduy3a/
位置复制到目标(ec2-54-171-121-28)机器的根文件系统。
-rv
选项意味着rsync以递归方式复制目录并输出有关结果的详细信息。
直到我的智慧EC2实例不允许root用户访问ssh,所以当我需要向需要root访问权限的EC2进行rsync时,我必须添加--rsync-path="sudo rsync"
作为我的rsync命令的参数。 / p>
PS :我现在无法对原始问题发表评论,但只要您询问rsync尝试做什么我相信这有资格作为答案 < / p>
答案 1 :(得分:0)
得到答案:&#34; spark-ec2不支持启动运行Ubuntu的集群。它是为使用自定义Amazon Linux AMI而构建的,它们希望您以root用户身份登录,并且具有特定版本的Unix实用程序,这些实用程序可能不存在于其他发行版中。&#34; Errors when deploying spark on EC2 with specified AMI-ID