我在Apache Spark 1.3中使用HiveContext
,因为我需要更好的查询支持(vs 1.3' s SQLContext
)。
我正在使用Azure' HDInsight' Spark集群。驱动程序的HiveContext正在尝试连接到不存在的Hive Metastore。这打破了司机。
我根本不需要Hive支持。
阻止Spark的HiveContext尝试连接到Hive 的最佳方法是什么?例如,未设置特定的环境属性? (有100个可能相关的预设属性)。
修改 Stacktrace:
15/10/14 06:35:29 WARN metastore: Failed to connect to the MetaStore Server...
15/10/14 06:35:50 WARN metastore: Failed to connect to the MetaStore Server...
15/10/14 06:36:11 WARN metastore: Failed to connect to the MetaStore Server...
java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at org.apache.spark.sql.hive.HiveContext.sessionState$lzycompute(HiveContext.scala:241)
at org.apache.spark.sql.hive.HiveContext.sessionState(HiveContext.scala:237)
at org.apache.spark.sql.hive.HiveContext$QueryExecution.<init>(HiveContext.scala:385)
at org.apache.spark.sql.hive.HiveContext.executePlan(HiveContext.scala:91)
at org.apache.spark.sql.hive.HiveContext.executePlan(HiveContext.scala:50)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:728)
at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:564)
..<snip>..
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
... 47 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
... 52 more
Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException:
connect timed out
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:336)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:214)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
at org.apache.spark.sql.hive.HiveContext.sessionState$lzycompute(HiveContext.scala:241)
at org.apache.spark.sql.hive.HiveContext.sessionState(HiveContext.scala:237)
at org.apache.spark.sql.hive.HiveContext$QueryExecution.<init>(HiveContext.scala:385)
at org.apache.spark.sql.hive.HiveContext.executePlan(HiveContext.scala:91)
at org.apache.spark.sql.hive.HiveContext.executePlan(HiveContext.scala:50)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:728)
at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:564)
..<snip>..
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 59 more
)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:382)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:214)
... 57 more
答案 0 :(得分:2)
相关属性为hive.metastore.uris
。
由于预加载thrift://headnodehost:9083
,它已预设为C:\apps\dist\spark-1.3.1.2.2.7.1-0004\hive-site.xml
。这在生成的CLASSPATH中早于我自己的hive-site.xml
被忽略。
我无法找到一种简单的工作方式来覆盖该属性值。 (如果你有办法,请发表评论)
作为一个黑客解决方案,我只是将hive-site.xml
移开了。当然,这必须通过RDP手动完成(您必须在头部节点上启用)。