AWS EC2 Spark Cluster:RSYNC错误

时间:2015-12-06 15:53:39

标签: amazon-ec2 apache-spark rsync

虽然EC2管理控制台显示正在运行的集群(1个主服务器,2个从服务器),但我无法访问端口8080上的Spark用户界面(UI)。启动命令运行我得到了相同的RSYNC错误,如下面的cluster start命令所示。

至少有人能解释一下RSYNC正在尝试做什么吗?我已经阅读了一些相关的帖子,但人们似乎非常了解RSYNC,对于像我这样的无知者,解释并不明显: - )

ubuntu@eu-west:~/spark-1.5.2/ec2$ ./spark-ec2 --key-pair=westkey --identity-file=/home/ubuntu/westkey.pem --region=eu-west-1 start my-spark-cluster
    Searching for existing cluster my-spark-cluster in region eu-west-1...
    Found 1 master, 2 slaves.
    Starting slaves...
    Starting master...
    Waiting for cluster to enter 'ssh-ready' state..........
    Cluster is now in 'ssh-ready' state. Waited 259 seconds.
    Cloning spark-ec2 scripts from https://github.com/amplab/spark-ec2/tree/branch-1.5 on master...
    Warning: Permanently added 'ec2-54-171-121-28.eu-west-1.compute.amazonaws.com,172.31.16.35' (ECDSA) to the list of known hosts.
    Please login as the user "ubuntu" rather than the user "root".

    Connection to ec2-54-171-121-28.eu-west-1.compute.amazonaws.com closed.
    Deploying files to master...
    Warning: Permanently added 'ec2-54-171-121-28.eu-west-1.compute.amazonaws.com,172.31.16.35' (ECDSA) to the list of known hosts.
    protocol version mismatch -- is your shell clean?
    (see the rsync man page for an explanation)
    rsync error: protocol incompatibility (code 2) at compat.c(174) [sender=3.1.0]
    Traceback (most recent call last):
      File "./spark_ec2.py", line 1517, in <module>
        main()
      File "./spark_ec2.py", line 1509, in main
        real_main()
      File "./spark_ec2.py", line 1500, in real_main
        setup_cluster(conn, master_nodes, slave_nodes, opts, False)
      File "./spark_ec2.py", line 836, in setup_cluster
        modules=modules
      File "./spark_ec2.py", line 1111, in deploy_files
        subprocess.check_call(command)
      File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['rsync', '-rv', '-e', 'ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /home/ubuntu/westkey.pem', '/tmp/tmpJduy3a/', u'root@ec2-54-171-121-28.eu-west-1.compute.amazonaws.com:/']' returned non-zero exit status 2

2 个答案:

答案 0 :(得分:2)

正在尝试执行命令

rsync -rv -e "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /home/ubuntu/westkey.pem" /tmp/tmpJduy3a/ root@ec2-54-171-121-28.eu-west-1.compute.amazonaws.com:/

这意味着它会尝试将目录树从“源”机器的/tmp/tmpJduy3a/位置复制到目标(ec2-54-171-121-28)机器的根文件系统。
-rv选项意味着rsync以递归方式复制目录并输出有关结果的详细信息。

直到我的智慧EC2实例不允许root用户访问ssh,所以当我需要向需要root访问权限的EC2进行rsync时,我必须添加--rsync-path="sudo rsync"作为我的rsync命令的参数。 / p>

PS 我现在无法对原始问题发表评论,但只要您询问rsync尝试做什么我相信这有资格作为答案 < / p>

答案 1 :(得分:0)

得到答案:&#34; spark-ec2不支持启动运行Ubuntu的集群。它是为使用自定义Amazon Linux AMI而构建的,它们希望您以root用户身份登录,并且具有特定版本的Unix实用程序,这些实用程序可能不存在于其他发行版中。&#34; Errors when deploying spark on EC2 with specified AMI-ID