我只有以下内容:
JavaPairRDD<ImmutableBytesWritable, Result> dataRDD = jsc
.newAPIHadoopRDD(
hbase_conf,
TableInputFormat.class,
org.apache.hadoop.hbase.io.ImmutableBytesWritable.class,
org.apache.hadoop.hbase.client.Result.class);
sparkConf.log().info("Count of data = "+String.valueOf(dataRDD.count()));
我得到了这个例外:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, server-name): java.lang.IllegalStateException: unread block data
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2394)
答案 0 :(得分:0)
您可以在此处找到一些提示:https://issues.apache.org/jira/browse/SPARK-1867
一些人用hadoop-common libs取代了hadoop-client。你的情况与我的情况类似,需要适当的hbase罐子。我将hbase/hbase-0.98.12/lib/*
添加到spark.executor.extraClasspath
,然后就消失了。这是设置这个jar的另一种方法。 https://groups.google.com/forum/#!topic/spark-users/gXSfbjauAjo