我正在尝试使用带有HDP2.6.1的docker构建边缘节点。除Spark支持外,一切都可用并且正在运行。我能够安装和运行pyspark,但只有当我发表评论enableHiveSupport()时。我已经将hive-site.xml复制到/ etc / spark2 / conf以及ambari,并且所有spark confs都与群集设置相匹配。但仍然会收到此错误:
17/10/27 02:35:57 WARN conf.HiveConf: HiveConf of name hive.groupby.position.alias does not exist
17/10/27 02:35:57 WARN conf.HiveConf: HiveConf of name hive.mv.files.thread does not exist
Traceback (most recent call last):
File "/usr/hdp/current/spark2-client/python/pyspark/shell.py", line 43, in <module>
spark = SparkSession.builder\
File "/usr/hdp/current/spark2-client/python/pyspark/sql/session.py", line 187, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
>>> spark.createDataFrame([(1,'a'), (2,'b')], ['id', 'nm'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'spark' is not defined
我试图搜索此错误,但我得到的所有结果都可能是与权限和hive-site.xml相关的Windows错误。但我正在建立它的核心:7.3.1611。并安装以下内容:
RUN wget http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.6.1.0/hdp.repo
RUN cp hdp.repo /etc/yum.repos.d
RUN yum -y install hadoop sqoop spark2_2_6_1_0_129-master spark2_2_6_1_0_129-python hive-hcatalog
答案 0 :(得分:0)
因此上述问题的解决方案是hive-site.xml只需要包含hive.metastore.uris和NOTHING ELSE的属性。 (参考:https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_spark-component-guide/content/spark-config-hive.html)。一旦你拿出其他房产,它就像一个魅力!