Question

使用HDP 2.5.3，我一直在尝试调试一些YARN容器类路径问题。

由于HDP同时包含Spark 1.6和2.0.0，因此存在一些冲突的版本

我支持的用户成功地能够在YARN client模式下使用带有Hive查询的Spark2，但是不能从cluster模式获得有关未找到的表的错误，或类似的东西，因为Metastore连接不是＆建立了。

我猜测在--driver-class-path /etc/spark2/conf:/etc/hive/conf之后设置--files /etc/spark2/conf/hive-site.xml或传递spark-submit会有效，但为什么hive-site.xml已经加载conf hive-site文件夹？

根据Hortonworks docs，$SPARK_HOME/conf说hdfs-site.xml应该放在core-site.xml中，它是......

我看到HADOOP_CONF_DIR以及其他属于h code>和2232355 4 drwx------ 2 yarn hadoop 4096 Aug 2 21:59 ./__spark_conf__ hadoop 2358 Aug 2 21:59 ./__spark_conf__/topology_script.py hadoop 4676 Aug 2 21:59 ./__spark_conf__/yarn-env.sh hadoop 569 Aug 2 21:59 ./__spark_conf__/topology_mappings.data hadoop 945 Aug 2 21:59 ./__spark_conf__/taskcontroller.cfg hadoop 620 Aug 2 21:59 ./__spark_conf__/log4j.properties hadoop 8960 Aug 2 21:59 ./__spark_conf__/hdfs-site.xml hadoop 2090 Aug 2 21:59 ./__spark_conf__/hadoop-metrics2.properties hadoop 662 Aug 2 21:59 ./__spark_conf__/mapred-env.sh hadoop 1308 Aug 2 21:59 ./__spark_conf__/hadoop-policy.xml hadoop 1480 Aug 2 21:59 ./__spark_conf__/__spark_conf__.properties hadoop 1602 Aug 2 21:59 ./__spark_conf__/health_check hadoop 913 Aug 2 21:59 ./__spark_conf__/rack_topology.data hadoop 1484 Aug 2 21:59 ./__spark_conf__/ranger-hdfs-audit.xml hadoop 1020 Aug 2 21:59 ./__spark_conf__/commons-logging.properties hadoop 5721 Aug 2 21:59 ./__spark_conf__/hadoop-env.sh hadoop 281 Aug 2 21:59 ./__spark_conf__/slaves hadoop 6407 Aug 2 21:59 ./__spark_conf__/core-site.xml hadoop 812 Aug 2 21:59 ./__spark_conf__/rack-topology.sh hadoop 1044 Aug 2 21:59 ./__spark_conf__/ranger-hdfs-security.xml hadoop 4956 Aug 2 21:59 ./__spark_conf__/metrics.properties hadoop 4221 Aug 2 21:59 ./__spark_conf__/task-log4j.properties hadoop 64 Aug 2 21:59 ./__spark_conf__/ranger-security.xml hadoop 19975 Aug 2 21:59 ./__spark_conf__/yarn-site.xml hadoop 1006 Aug 2 21:59 ./__spark_conf__/ranger-policymgr-ssl.xml hadoop 29 Aug 2 21:59 ./__spark_conf__/yarn.exclude hadoop 1606 Aug 2 21:59 ./__spark_conf__/container-executor.cfg hadoop 1000 Aug 2 21:59 ./__spark_conf__/ssl-server.xml hadoop 1 Aug 2 21:59 ./__spark_conf__/dfs.exclude hadoop 7660 Aug 2 21:59 ./__spark_conf__/mapred-site.xml hadoop 14474 Aug 2 21:59 ./__spark_conf__/capacity-scheduler.xml hadoop 884 Aug 2 21:59 ./__spark_conf__/ssl-client.xml ive-site的文件，这是来自YARN UI容器信息的。



conf/hive-site.xml

正如您可能看到的那样，[spark@asthad006 conf]$ pwd && ls -l
/usr/hdp/2.5.3.0-37/spark2/conf
total 32
-rw-r--r-- 1 spark spark   742 Mar  6 15:20 hive-site.xml
-rw-r--r-- 1 spark spark   620 Mar  6 15:20 log4j.properties
-rw-r--r-- 1 spark spark  4956 Mar  6 15:20 metrics.properties
-rw-r--r-- 1 spark spark   824 Aug  2 22:24 spark-defaults.conf
-rw-r--r-- 1 spark spark  1820 Aug  2 22:24 spark-env.sh
-rwxr-xr-x 1 spark spark   244 Mar  6 15:20 spark-thrift-fairscheduler.xml
-rw-r--r-- 1 hive  hadoop  918 Aug  2 22:24 spark-thrift-sparkconf.conf
不存在，即使我确实有HADOOP_CONF_DIR来提交spark-submit 

HIVE_CONF_DIR

所以，我不认为我应该在hive-site.xml放置hive-site，因为val list = listOf('a', 'b', 'c')
是分开的，但我的问题是我们如何让Spark2拿起{ {1}}无需在运行时手动将其作为参数传递？

 编辑当然，因为我使用的是HDP，我使用的是Ambari。以前的集群管理员已在所有计算机上安装了Spark2客户端，因此可能是潜在Spark驱动程序的所有YARN NodeManager都应具有相同的配置文件

Answer 1

我理解它的方式，local或yarn-client模式......

Launcher检查HDFS，YARN，Hive，HBase是否需要Kerberos令牌 Hive / Hadoop客户端库在CLASSPATH中搜索＆gt; hive-site.xml（包括在driver.extraClassPath中，因为驱动程序在Launcher中运行，合并的CLASSPATH已经在此时建成）
驱动程序检查用于内部目的的哪种Metastore ：由易失性Derby实例或常规Hive Metastore支持的独立Metastore ＆gt; 那是$SPARK_CONF_DIR/hive-site.xml
使用Hive界面时，Metastore连接用于读取/写入驱动程序中的Hive元数据 Hive / Hadoop客户端库在CLASSPATH中搜索＆gt; hive-site.xml（如果有的话，使用Kerberos令牌）

所以你可以让一个hive-site.xml声明Spark应该使用嵌入式内存中的Derby实例作为沙箱（内存中暗示“停止将所有这些临时文件留在你身后”） 而另一个hive-site.xml提供实际的Hive Metastore URI。一切都很好。

现在，在yarn-cluster模式下，所有这些机制在一个令人讨厌的，无证件的混乱中爆炸了。

Launcher需要自己的CLASSPATH设置来创建Kerberos令牌，否则它会无声地失败。最好转到源代码，找出你应该使用哪个未记录的Env变量它可能还需要在某些属性中覆盖，因为硬编码的默认值突然不再是默认值（静默）。

驱动程序无法点按原始$SPARK_CONF_DIR，它必须依赖Launcher可用于上传的内容。这是否包含$SPARK_CONF_DIR/hive-site.xml的副本？看起来并非如此。
所以你可能正在使用Derby作为存根。

并且驱动程序必须处理任何YARN强加于容器CLASSPATH的任何内容。此外，默认情况下driver.extraClassPath添加不优先;为此你必须强制spark.yarn.user.classpath.first=true （它被翻译成标准的Hadoop属性，其确切名称我现在不记得了，特别是因为有多个具有相似名称的道具可能被弃用和/或不适用于Hadoop 2.x）

yarn-cluster

底线：再次启动诊断。

A。您是否真的，确定神秘的“Metastore连接错误”是由缺少的属性引起的，特别是Metastore URI？

B。顺便说一下，您的用户是否明确使用HiveContext ???

C。 YARN提供给驱动程序JVM的CLASSPATH究竟是什么，以及驱动程序在打开Metastore连接时向Hadoop库提供的CLASSPATH究竟是什么？

D。如果YARN构建的CLASSPATH由于某种原因搞砸了，最小修复是什么 - 更改优先规则？加成？既？

Answer 2

您可以使用spark属性 - .bounce_arrow { -webkit-animation-name: bounce; -moz-animation-name: bounce; -o-animation-name: bounce; animation-name: bounce;} .animated_arrow{ -webkit-animation-fill-mode:both; -moz-animation-fill-mode:both; -ms-animation-fill-mode:both; -o-animation-fill-mode:both; animation-iteration-count: infinite; -moz-animation-iteration-count: infinite; -webkit-animation-iteration-count: infinite; animation-fill-mode:both; -webkit-animation-duration:2s; -moz-animation-duration:2s; -ms-animation-duration:2s; -o-animation-duration:2s; animation-duration:2s; } @-webkit-keyframes bounce_arrow { 0%, 20%, 50%, 80%, 100% {-webkit-transform: translateY(0);} 40% {-webkit-transform: translateY(-30px);} 60% {-webkit-transform: translateY(-10px);} } @-moz-keyframes bounce_arrow { 0%, 20%, 50%, 80%, 100% {-moz-transform: translateY(0);} 40% {-moz-transform: translateY(-20px);} 60% {-moz-transform: translateY(-10px);} }并在那里指定hive-site.xml的路径。

Answer 3

在cluster mode配置中，从机器的conf目录中读取，该目录运行driver容器，而不是用于spark-submit的容器。

Answer 4

发现此问题

在创建配置单元上下文之前创建org.apache.spark.sql.SQLContext时，在创建配置单元上下文时未正确选择hive-site.xml。

解决方案：在创建另一个SQL上下文之前创建配置单元上下文。

使用spark-submit YARN群集模式时缺少hive-site

4 个答案: