防止Spark HiveContext连接到Hive

时间:2015-10-14 22:27:14

标签: azure apache-spark hive hdinsight

我在Apache Spark 1.3中使用HiveContext,因为我需要更好的查询支持(vs 1.3' s SQLContext)。

我正在使用Azure' HDInsight' Spark集群。驱动程序的HiveContext正在尝试连接到不存在的Hive Metastore。这打破了司机。

我根本不需要Hive支持。

阻止Spark的HiveContext尝试连接到Hive 的最佳方法是什么?例如,未设置特定的环境属性? (有100个可能相关的预设属性)。

修改 Stacktrace:

15/10/14 06:35:29 WARN metastore: Failed to connect to the MetaStore Server...
15/10/14 06:35:50 WARN metastore: Failed to connect to the MetaStore Server...
15/10/14 06:36:11 WARN metastore: Failed to connect to the MetaStore Server...
java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
        at org.apache.spark.sql.hive.HiveContext.sessionState$lzycompute(HiveContext.scala:241)
        at org.apache.spark.sql.hive.HiveContext.sessionState(HiveContext.scala:237)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.<init>(HiveContext.scala:385)
        at org.apache.spark.sql.hive.HiveContext.executePlan(HiveContext.scala:91)
        at org.apache.spark.sql.hive.HiveContext.executePlan(HiveContext.scala:50)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131)
        at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
        at org.apache.spark.sql.SQLContext.load(SQLContext.scala:728)
        at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:564)
        ..<snip>..
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
        ... 47 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
        ... 52 more
Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException:
connect timed out
        at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:336)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:214)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
        at org.apache.spark.sql.hive.HiveContext.sessionState$lzycompute(HiveContext.scala:241)
        at org.apache.spark.sql.hive.HiveContext.sessionState(HiveContext.scala:237)
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.<init>(HiveContext.scala:385)
        at org.apache.spark.sql.hive.HiveContext.executePlan(HiveContext.scala:91)
        at org.apache.spark.sql.hive.HiveContext.executePlan(HiveContext.scala:50)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131)
        at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
        at org.apache.spark.sql.SQLContext.load(SQLContext.scala:728)
        at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:564)
        ..<snip>..
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.SocketTimeoutException: connect timed out
        at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
        ... 59 more
)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:382)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:214)
        ... 57 more

1 个答案:

答案 0 :(得分:2)

相关属性为hive.metastore.uris

由于预加载thrift://headnodehost:9083,它已预设为C:\apps\dist\spark-1.3.1.2.2.7.1-0004\hive-site.xml。这在生成的CLASSPATH中早于我自己的hive-site.xml被忽略。

我无法找到一种简单的工作方式来覆盖该属性值。 (如果你有办法,请发表评论)

作为一个黑客解决方案,我只是将hive-site.xml移开了。当然,这必须通过RDP手动完成(您必须在头部节点上启用)。