scala:HBase:newAPIHadoopRDD不起作用

时间:2018-01-26 12:03:03

标签: scala apache-spark hbase rdd

我试图使用newAPIHadoopRDD类从简单的HBase表中读取。 这里的代码是

import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor}
import org.apache.hadoop.hbase.client.{HBaseAdmin, Result}
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.TableInputFormat

val tableName = "t1"
val hconf = HBaseConfiguration.create()
hconf.set(TableInputFormat.INPUT_TABLE, "t1")

val hBaseRDD = sc.newAPIHadoopRDD(hconf, classOf[TableInputFormat], 
classOf[ImmutableBytesWritable], classOf[Result])
println("records found : " + hBaseRDD.count())

问题在于RDD上的每个操作(在这种情况下计数,但与例如收集相同),不会返回有关运行错误或成功的任何信息,只是挂出并且似乎是在一个循环中。 还有其他人有同样的问题吗?或者知道热修复它?

如果我点击ctrl-C

,这就是我收到的
scala> hBaseRDD.count
18/01/30 07:23:15 WARN repl.Signaling: Cancelling all active jobs, this can take a while. Press Ctrl+C again to exit now.
18/01/30 07:23:15 WARN spark.ExecutorAllocationManager: No stages are running, but numRunningTasks != 0
org.apache.spark.SparkException: Job 0 cancelled as part of cancellation of all jobs
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1430)
  at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1370)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:716)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:716)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:716)
  at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
  at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:716)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1623)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1600)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1589)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:623)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1930)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1943)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1956)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1970)
  at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
  ... 48 elided

0 个答案:

没有答案