KafkaConsumer的并发异常对于多线程访问是不安全的

时间:2017-07-15 07:54:00

标签: apache-spark apache-kafka spark-streaming

我们从Spark流媒体调用SparkSQL作业。我们得到并发异常,Kafka消费者是关闭错误。这是代码和异常细节:

  

Kafka消费者代码

if (new[] { "a", "b", "b" }.Contains("c"))

...

// Start reading messages from Kafka and get DStream
        final JavaInputDStream<ConsumerRecord<String, byte[]>> consumerStream = KafkaUtils.createDirectStream(
                getJavaStreamingContext(), LocationStrategies.PreferConsistent(),
                ConsumerStrategies.<String, byte[]>Subscribe(SparkServiceConfParams.AIR.CONSUME_TOPICS,
                        sparkServiceConf.getKafkaConsumeParams()));

        ThreadContext.put(Constants.CommonLiterals.LOGGER_UID_VAR, CommonUtils.loggerUniqueId());
    // Decode each binary message and generate JSON array
    JavaDStream<String> decodedStream = messagesStream.map(new Function<byte[], String>() {}
  

错误详情

    // publish generated json gzip to kafka 
    decodedStream.foreachRDD(new VoidFunction<JavaRDD<String>>() {
        private static final long serialVersionUID = 1L;

        @Override
        public void call(JavaRDD<String> jsonRdd4DF) throws Exception {
            //Dataset<Row> json = sparkSession.read().json(jsonRdd4DF);
            if(!jsonRdd4DF.isEmpty()) {
                //JavaRDD<String> jsonRddDF = getJavaSparkContext().parallelize(jsonRdd4DF.collect());
                Dataset<Row> json = sparkSession.read().json(jsonRdd4DF);   

                SparkAIRMainJsonProcessor airMainJsonProcessor = new SparkAIRMainJsonProcessor();

                    AIRDataSetBean processAIRData = airMainJsonProcessor.processAIRData(json, sparkSession);

最后Kafka消费者关闭了:

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access

1 个答案:

答案 0 :(得分:1)

使用Spark流的Cache或Persist选项解决此问题。在这种情况下,不再使用缓存RDD从Kafka读取并解决问题。它支持并发使用流。但请明智地使用缓存选项。这是代码:

JavaDStream<ConsumerRecord<String, byte[]>> cache = consumerStream.cache();