Spark HBase连接器在连接上引发异常

时间:2019-05-13 16:02:59

标签: apache-spark apache-spark-sql hbase spark-streaming

根据Hbase提供的以下文档,尝试使用spark连接到Hbase。

https://hbase.apache.org/book.html#_sparksql_dataframes

代码:

val cat =
         s"""{
            |"table":{"namespace":"test", "name":"data_inv"},
            |"rowkey":"key",
            |"columns":{
            |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
            |"col1":{"cf":"src_data", "col":"src_stream_desc", "type":"string"}
            |}
            |}""".stripMargin

    val spark = SparkSession
      .builder()
      .appName(getClass.toString)
      .getOrCreate()

    val hbaseContext = new HBaseContext(spark.sparkContext, spark.sparkContext.hadoopConfiguration)


    val df = withCatalog(cat, spark)

    df.printSchema()

   df.show(20, false)

def withCatalog(cat: String,spark:SparkSession): DataFrame = {
    spark.sqlContext
      .read
      .options(Map(HBaseTableCatalog.tableCatalog->cat))
      .format("org.apache.hadoop.hbase.spark")
      .load()
  }

使用API​​:https://github.com/apache/hbase-connectors/tree/master/spark

但是收到以下错误消息,

Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=69024: Call to hostname/ip:60020 failed on local exception: java.io.IOException: Connection closed row 'data_inv,,' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hostname,60020,1557761206926, seqNum=-1
    at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:158)
    at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Call to hostname/ip:60020 failed on local exception: java.io.IOException: Connection closed
    at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:180)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
    at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
    at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
    at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.cleanupCalls(NettyRpcDuplexHandler.java:202)
    at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.channelInactive(NettyRpcDuplexHandler.java:210)
    at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
    at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
    at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)

有人可以帮助解决这个错误吗?

1 个答案:

答案 0 :(得分:0)

看起来您的defaultFS不正确,可以尝试吗?:

spark.sparkContext.hadoopConfiguration.set("fs.defaultFS", "hdfs://IP:PORT")
val hbaseContext = new HBaseContext(spark.sparkContext, spark.sparkContext.hadoopConfiguration)