我正在使用Kafka和Spark将我的数据更新流式传输到我的Hbase表中。
但是我一直得到OffsetOutOfRangeException,这是我的代码:
new KafkaStreamBuilder()
.setStreamingContext(streamingContext)
.setTopics(topics)
.setDataSourceId(dataSourceId)
.setOffsetManager(offsetManager)
.setConsumerParameters(
ImmutableMap
.<String, String>builder()
.putAll(kafkaConsumerParams)
.put("group.id", groupId)
.put("metadata.broker.list", kafkaBroker))
.build()
)
.build()
.foreachRDD(
rdd -> {
rdd.foreachPartition(
iter -> {
final Table hTable = createHbaseTable(settings);
try {
while (iter.hasNext()) {
String json = new String(iter.next());
try {
putRow(
hTable,
json,
settings,
barrier);
} catch (Exception e) {
throw new RuntimeException("hbase write failure", e);
}
}
} catch (OffsetOutOfRangeException e) {throw new RuntimeException(
"encountered OffsetOutOfRangeException: ", e);
}
});
});
我将流媒体工作设置为每5分钟运行一次,每次,在我的消费者完成一批流媒体后,它会将最新的标记和检查点写入S3。下次在流媒体作业运行之前,它会从S3读取之前的检查点和标记,然后从那里开始。
这是异常堆栈跟踪:
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:925)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1944)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: kafka.common.OffsetOutOfRangeException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at kafka.common.ErrorMapping$.exceptionFor(ErrorMapping.scala:86)
at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.handleFetchErr(KafkaRDD.scala:188)
at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.fetchBatch(KafkaRDD.scala:197)
at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.getNext(KafkaRDD.scala:212)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
我做了什么: 我已经检查过,标记和检查点都按预期工作。
所以,我在这里有点迷失,怎么会发生这种异常以及可能的/合理的解决方案呢?
谢谢!