Spark-HBASE错误java.lang.IllegalStateException:未读块数据

时间:2016-01-20 13:25:56

标签: apache-spark hbase apache-spark-sql

我正在尝试使用jersey Rest-API通过java-Spark程序从HBASE表中获取记录然后我得到下面提到的错误但是当我通过spark-Jar访问HBase表时,代码正在执行而没有错误。

我有一个2个工作节点用于Hbase,2个工作节点用于火花,由同一个主人维护。

  

警告TaskSetManager:阶段0.0中的丢失任务1.0(TID 1,172.31.16.140):java.lang.IllegalStateException:未读块数据       at java.io.ObjectInputStream $ BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)       在org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)       在org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)       在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:194)       在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)       at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615)       在java.lang.Thread.run(Thread.java:745)

3 个答案:

答案 0 :(得分:3)

好吧,我可能知道你的问题,因为我刚刚经历过。

原因很可能是错过了一些hbase jar,因为在spark runing期间,spark需要通过hbase jar读取数据,如果不存在,那么会抛出一些异常,你该怎么办?很容易

在提交作业之前,你需要添加params --jars并加入以下的jar:

- 罐子 /ROOT/server/hive/lib/hive-hbase-handler-1.2.1.jar,
/ROOT/server/hbase/lib/hbase-client-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-common-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-server-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-hadoop2-compat-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/guava-12.0.1.jar,
/ROOT/server/hbase/lib/hbase-protocol-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/htrace-core-2.04.jar

如果可以的话,享受吧!

答案 1 :(得分:0)

在提交使用java api实现的spark作业时,我在CDH5.4.0中遇到了同样的问题,这是我的解决方案:

解决方案1:使用spark-submit

--jars zookeeper-3.4.5-cdh5.4.0.jar, 
hbase-client-1.0.0-cdh5.4.0.jar, 
hbase-common-1.0.0-cdh5.4.0.jar,
hbase-server1.0.0-cdh5.4.0.jar,
hbase-protocol1.0.0-cdh5.4.0.jar,
htrace-core-3.1.0-incubating.jar,
// custom jars which are needed in the spark executors

解决方案2:在代码中使用SparkConf

SparkConf.setJars(new String[]{"zookeeper-3.4.5-cdh5.4.0.jar",
"hbase-client-1.0.0-cdh5.4.0.jar",
"hbase-common-1.0.0-cdh5.4.0.jar",
"hbase-server1.0.0-cdh5.4.0.jar",
"hbase-protocol1.0.0-cdh5.4.0.jar",
"htrace-core-3.1.0-incubating.jar",
// custom jars which are needed in the spark executors
});

汇总
问题是由于spark项目中缺少jar,你需要将这些jar添加到项目类路径中,此外,使用上述2个解决方案来帮助将这些jar分发到你的spark集群。

答案 2 :(得分:0)

CDP / CDH:

第一步:将 hbase-site.xml 文件复制到 / etc / spark / conf / 目录中。 cp /opt/cloudera/parcels/CDH/lib/hbase/conf/hbase-site.xml / etc / spark / conf /

步骤2:将以下库添加到spark-submit / spark-shell。

/opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar
/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar
/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar

火花壳:

sudo -u hive spark-shell --master yarn --jars /opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar,/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar,/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar --files /etc/spark/conf/hbase-site.xml