我正在尝试使用jersey Rest-API通过java-Spark程序从HBASE表中获取记录然后我得到下面提到的错误但是当我通过spark-Jar访问HBase表时,代码正在执行而没有错误。
我有一个2个工作节点用于Hbase,2个工作节点用于火花,由同一个主人维护。
警告TaskSetManager:阶段0.0中的丢失任务1.0(TID 1,172.31.16.140):java.lang.IllegalStateException:未读块数据 at java.io.ObjectInputStream $ BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) 在org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) 在org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) 在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:194) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615) 在java.lang.Thread.run(Thread.java:745)
答案 0 :(得分:3)
原因很可能是错过了一些hbase jar,因为在spark runing期间,spark需要通过hbase jar读取数据,如果不存在,那么会抛出一些异常,你该怎么办?很容易
在提交作业之前,你需要添加params --jars并加入以下的jar:
- 罐子
/ROOT/server/hive/lib/hive-hbase-handler-1.2.1.jar,
/ROOT/server/hbase/lib/hbase-client-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-common-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-server-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/hbase-hadoop2-compat-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/guava-12.0.1.jar,
/ROOT/server/hbase/lib/hbase-protocol-0.98.12-hadoop2.jar,
/ROOT/server/hbase/lib/htrace-core-2.04.jar
如果可以的话,享受吧!
答案 1 :(得分:0)
在提交使用java api实现的spark作业时,我在CDH5.4.0中遇到了同样的问题,这是我的解决方案:
解决方案1:使用spark-submit :
--jars zookeeper-3.4.5-cdh5.4.0.jar,
hbase-client-1.0.0-cdh5.4.0.jar,
hbase-common-1.0.0-cdh5.4.0.jar,
hbase-server1.0.0-cdh5.4.0.jar,
hbase-protocol1.0.0-cdh5.4.0.jar,
htrace-core-3.1.0-incubating.jar,
// custom jars which are needed in the spark executors
解决方案2:在代码中使用SparkConf :
SparkConf.setJars(new String[]{"zookeeper-3.4.5-cdh5.4.0.jar",
"hbase-client-1.0.0-cdh5.4.0.jar",
"hbase-common-1.0.0-cdh5.4.0.jar",
"hbase-server1.0.0-cdh5.4.0.jar",
"hbase-protocol1.0.0-cdh5.4.0.jar",
"htrace-core-3.1.0-incubating.jar",
// custom jars which are needed in the spark executors
});
汇总
问题是由于spark项目中缺少jar,你需要将这些jar添加到项目类路径中,此外,使用上述2个解决方案来帮助将这些jar分发到你的spark集群。
答案 2 :(得分:0)
CDP / CDH:
第一步:将 hbase-site.xml 文件复制到 / etc / spark / conf / 目录中。 cp /opt/cloudera/parcels/CDH/lib/hbase/conf/hbase-site.xml / etc / spark / conf /
步骤2:将以下库添加到spark-submit / spark-shell。
/opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar
/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar
/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar
火花壳:
sudo -u hive spark-shell --master yarn --jars /opt/cloudera/parcels/CDH/jars/hive-hbase-handler-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-client-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-common-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-server-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat-*.jar, /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-*.jar,/opt/cloudera/parcels/CDH/jars/guava-28.1-jre.jar,/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar --files /etc/spark/conf/hbase-site.xml