将Hbase表加载到Spark上 -

时间:2015-01-06 06:31:32

标签: hbase apache-spark

我正在尝试https://www.mapr.com/developercentral/code/loading-hbase-tables-spark#.VKtxqivF_fS中的示例。该表是gettign创建的,当我通过HBase shell检查时插入行。但是创建RDD然后计数的下一步会产生以下错误。 感谢帮助。

java.lang.IllegalStateException: unread block data
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode
(ObjectInputStream.java:2421)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
org.apache.spark.serializer.JavaDeserializationStream.readObject
(JavaSerializer.scala:62)
org.apache.spark.serializer.JavaSerializerInstance.deserialize
(JavaSerializer.scala:87)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:159)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:1)

使用--jars选项将HBase jar文件传递给worker,解决了该问题。我之前只使用了--driver-class-path。

如下所示

spark-submit --master spark://sparkhost:7077\
 --class SimpleApp \
 --jars /home/hadoop/Spark/hbase-0.98.7-hadoop2/lib/hbase-client-0.98.7-hadoop2.jar,\
/home/hadoop/Spark/hbase-0.98.7-hadoop2/lib/hbase-server-0.98.7-hadoop2.jar,\
/home/hadoop/Spark/hbase-0.98.7-hadoop2/lib/hbase-protocol-0.98.7-hadoop2.jar,\
/home/hadoop/Spark/hbase-0.98.7-hadoop2/lib/hbase-common-0.98.7-hadoop2.jar,\
/home/hadoop/BigDataEDW/htrace-core-2.04.jar\
 /home/hadoop/BigDataEDW/hbase-spark_2.10-1.0.0-SNAPSHOT.jar