我在Amazon EMR上启动基于spark的hiveserver2,它具有额外的类路径依赖性。由于Amazon EMR中的此错误:
https://petz2000.wordpress.com/2015/08/18/get-blas-working-with-spark-on-amazon-emr/
我的类路径无法通过“--driver-class-path”选项
提交所以我有必要修改/etc/spark/conf/spark-env.conf以添加额外的类路径:
# Add Hadoop libraries to Spark classpath
SPARK_CLASSPATH="${SPARK_CLASSPATH}:${HADOOP_HOME}/*:${HADOOP_HOME}/../hadoop-hdfs/*:${HADOOP_HOME}/../hadoop-mapreduce/*:${HADOOP_HOME}/../hadoop-yarn/*:/home/hadoop/git/datapassport/*"
其中“/ home / hadoop / git / datapassport / *”是我的类路径。
但是,在成功启动服务器之后,Spark环境参数显示我的更改无效:
spark.driver.extraClassPath :/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
此配置文件是否已过时?新文件在哪里以及如何解决此问题?
答案 0 :(得分:2)
您是否尝试在spark.driver.extraClassPath
中设置spark-defaults
?像this这样的东西:
[
{
"Classification": "spark-defaults",
"Properties": {
"spark.driver.extraClassPath": "${SPARK_CLASSPATH}:${HADOOP_HOME}/*:${HADOOP_HOME}/../hadoop-hdfs/*:${HADOOP_HOME}/../hadoop-mapreduce/*:${HADOOP_HOME}/../hadoop-yarn/*:/home/hadoop/git/datapassport/*"
}
}
]
答案 1 :(得分:2)
您可以使用--driver-classpath。
从新的EMR集群在主节点上启动spark-shell。
spark-shell --master yarn-client
scala> sc.getConf.get("spark.driver.extraClassPath")
res0: String = /etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
使用--bootstrap-action将您的JAR文件添加到EMR集群。
当你将spark-submit prepend(或追加)你的JAR文件调用你从spark-shell获得的extraClassPath的值时
spark-submit --master yarn-cluster --driver-classpath /home/hadoop/my-custom-jar.jar:/etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
这对我来说使用EMR版本4.1和4.2。
构建spark.driver.extraClassPath的过程可能会在不同版本之间发生变化,这可能是SPARK_CLASSPATH不再起作用的原因。