我使用cdh5.9.0进行群集设置。 cloudera发布的默认Spark服务包是1.6.0。 我需要将相同的升级到1.6.3,因为分布式缓存问题已在以下git提交中解决:https://github.com/RicoGit/spark/commit/e5f1d9c8f9c94615322aaf7508e753307f553d53
如果我能够了解升级cloudera上部署的spark服务的简洁方法。 此外,在此扩展中,如何升级到Spark 2.0以及同一群集。
谢谢。
答案 0 :(得分:2)
最近Cloudera发布了Spark 2.0 parcels,您可以从spark archive
下载按照link进行安装步骤
注意: Apache Spark 2.0只能安装在CDH 5.7,CDH 5.8或CDH 5.9群集上,并且要求最低CM版本为5.8.3,5.9或更高
答案 1 :(得分:2)
只需执行以下步骤:
https://gist.github.com/shredder47/ce2f158a2a3907c0d264c5e9e4aab2fa
或
java -version
sudo yum remove java
sudo yum install java-1.8.0-openjdk
source ~/.bash_profile
Download Spark 2.4.7 With Hadoop 2.6 (Tar)
Extract contents.
Move the contents of the folder to :
/usr/local/spark
Now,
Open:
/usr/bin/pyspark
/usr/bin/spark-shell
/usr/bin/spark-submit
and change the value for each files to
'exec /usr/local/spark/bin/pyspark "$@"'
'exec /usr/local/spark/bin/spark-shell "$@"'
'exec /usr/local/spark/bin/spark-submit "$@"'
Now try running spark to check the version