Apache Spark-ec2脚本:“ERROR Unknown Spark version”。破了init.sh?

时间:2016-11-19 08:59:17

标签: apache-spark amazon-ec2 spark-ec2

我想使用spark-ec2脚本启动AWS EC2实例。我收到这个错误:

Initializing spark
--2016-11-18 22:33:06--  http://s3.amazonaws.com/spark-related-packages/spark-1.6.3-bin-hadoop1.tgz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.1.3
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.1.3|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-11-18 22:33:06 ERROR 404: Not Found.
ERROR: Unknown Spark version

本地安装的spark来自spark-1.6.3-bin-hadoop2.6.tgz,因此安装不应该尝试访问spark-1.6.3-bin-hadoop1.tgz。在init.sh中,当HADOOP_MAJOR_VERSION == 1时将安装此spark版本:

      if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
    wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-hadoop1.tgz
  elif [[ "$HADOOP_MAJOR_VERSION" == "2" ]]; then
    wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-cdh4.tgz
  else
    wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-hadoop2.4.tgz
  fi
  if [ $? != 0 ]; then
    echo "ERROR: Unknown Spark version"
    return -1

问题是:

- 在http://s3.amazonaws.com/spark-related-packages没有带有hadoop1的spark版本,所以这是安装spark失败的基本原因。

- HADOOP_MAJOR_VERSION好像在安装过程中被设置为1,即使我的安装有Hadoop版本2.x,导致上面的问题。

- spark_ec2.py在安装过程中从github中提取最新的spark-ec2,所以我没有看到可能的本地修复。我不相信直接从github分支和黑客攻击这个脚本。

有关如何解决此问题的任何想法?

1 个答案:

答案 0 :(得分:0)

通过在本地调用spark-ec2脚本时包含此选项来解决问题:

- hadoop_major_version = 2

请参阅:https://github.com/amplab/spark-ec2/issues/43