Question

我运行的涉及HBase访问的任何Spark作业都会导致以下错误。我自己的工作是在Scala中，但提供的python示例结尾相同。集群是Cloudera，运行CDH 5.4.4。使用CDH 5.3.1在不同的集群上运行相同的作业。

任何帮助都非常有用！

...
15/08/15 21:46:30 WARN TableInputFormatBase: initializeTable called multiple times. Overwriting connection and table reference; TableInputFormatBase will not close these old references when done.
...
15/08/15 21:46:32 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, some.server.name): java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details.
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:163)
...
Caused by: java.lang.IllegalStateException: The input format instance has not been properly initialized. Ensure you call initializeTable either in your constructor or initialize method
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:389)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.createRecordReader(TableInputFormatBase.java:158)
... 14 more

Answer 1

使用以下参数运行spark-shell： --driver-class-path ... / cloudera / parcels / CDH / lib / hbase / lib / htrace-core-3.1.0-incubating.jar --driver-java-options＆＃34; -Dspark.executor。 extraClassPath = ... / Cloudera的/包裹/ CDH / LIB / HBase的/ LIB / HTRACE核-3.1.0-incubating.jar＆＃34;

为什么它的工作原理被描述为here。

使用HBase的Spark作业失败

1 个答案: