Question

我正在尝试在我的EC2集群上运行大数据benchmark，用于我自己的位于here的Spark分支。它只是修改了Spark核心上的一些文件。我的群集包含1个主节点和2个类型为m1.large的从节点。我使用与Spark捆绑的ec2脚本来启动我的集群。集群发布完美，我能够成功地进入主服务器。但是，当我尝试使用命令

从主服务器运行基准测试时

./runner/prepare-benchmark.sh --shark --aws-key-id=xxxxxxxx --aws-key=xxxxxxxx --shark-host=<my-spark-master> --shark-identity-file=/root/.ssh/id_rsa --scale-factor=1

我收到以下错误：

=== IMPORTING BENCHMARK DATA FROM S3 ===
bash: /root/ephemeral-hdfs/bin/hdfs: No such file or directory
Connection to ec2-54-201-169-165.us-west-2.compute.amazonaws.com closed.
bash: /root/mapreduce/bin/start-mapred.sh: No such file or directory
Connection to ec2-54-201-169-165.us-west-2.compute.amazonaws.com closed.
Traceback (most recent call last):
  File "./prepare_benchmark.py", line 606, in <module>
    main()
  File "./prepare_benchmark.py", line 594, in main
    prepare_shark_dataset(opts)
  File "./prepare_benchmark.py", line 192, in prepare_shark_dataset
    ssh_shark("/root/mapreduce/bin/start-mapred.sh")
  File "./prepare_benchmark.py", line 180, in ssh_shark
    ssh(opts.shark_host, "root", opts.shark_identity_file, command)
  File "./prepare_benchmark.py", line 139, in ssh
    (identity_file, username, host, command), shell=True)
  File "/usr/lib64/python2.6/subprocess.py", line 505, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'ssh -t -o StrictHostKeyChecking=no -i         /root/.ssh/id_rsa root@ec2-54-201-169-165.us-west-2.compute.amazonaws.com 'source     /root/.bash_profile; 
/root/mapreduce/bin/start-mapred.sh'' returned non-zero exit     status 127

我尝试终止群集并再次启动它多次，但问题仍然存在。可能是什么问题？

针对EC2上的Spark错误的AmpLab大数据基准测试

0 个答案: