EMR Spark thrift服务器创建表:NoRouteToHost

时间:2016-10-12 16:24:40

标签: apache-spark hive apache-spark-sql emr

在蜂巢Metastore上运行Spark的thriftserver。

当我通过spark.sql

执行以下DDL时
create table if not exists test_table
     USING org.apache.spark.sql.parquet
     OPTIONS (
         path "s3n://parquet_folder/",
           mergeSchema "true")

发出以下堆栈跟踪;冲击线是指示的主机IP(例如172.31.8.86)不存在。

java.net.NoRouteToHostException: No Route to Host from  ip-172-31-13-2/172.31.13.2 to ip-172-31-8-86.us-west-2.compute.internal:8020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:758)
  at org.apache.hadoop.ipc.Client.call(Client.java:1479)
  at org.apache.hadoop.ipc.Client.call(Client.java:1412)
  at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
  at com.sun.proxy.$Proxy13.delete(Unknown Source)
  at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:540)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
  at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
  at com.sun.proxy.$Proxy14.delete(Unknown Source)
  at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2044)
  at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707)
  at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703)
  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:703)
  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply$mcV$sp(HiveExternalCatalog.scala:185)
  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply(HiveExternalCatalog.scala:152)
  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply(HiveExternalCatalog.scala:152)
  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72)
  at org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:152)
  at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:226)
  at org.apache.spark.sql.execution.command.CreateDataSourceTableUtils$.createDataSourceTable(createDataSourceTables.scala:501)
  at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:105)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
  at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
  at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:186)
  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
  ... 48 elided
Caused by: java.net.NoRouteToHostException: No route to host
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
  at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
  at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
  at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
  at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
  at org.apache.hadoop.ipc.Client.call(Client.java:1451)
  ... 87 more

2 个答案:

答案 0 :(得分:0)

问题是外部Metastore是由另一个EMR集群创建的。显然,hive Metastore维护集群状态(ip地址)。

立即解决方案是删除配置单元数据库并使用/usr/lib/hive/bin/schematool重建。

答案 1 :(得分:0)

您可以通过运行以下命令来解决此问题,而无需删除配置单元数据库:

html

可以使用以下方法检索OLD-URL:

hive --service metatool -updateLocation NEW-URL OLD-URL

NEW-URL是新群集主服务器的域。