mongoDB& Spark:" com.mongodb.MongoSocketReadException:过早地到达流的末尾"

时间:2017-11-22 14:53:08

标签: java mongodb apache-spark apache-kafka

我有一个Java应用程序来处理Kafka的avro消息流,并且每个消息都对mongoDB集合执行查询。

在正确处理了几十条消息之后,应用程序停止运行并抛出" com.mongodb.MongoSocketReadException:过早地到达流的末尾"。

以下是代码:

    JavaPairInputDStream<String, byte[]> directKafkaStream = KafkaUtils.createDirectStream(jsc,
            String.class, byte[].class, StringDecoder.class, DefaultDecoder.class, kafkaParams, topics);

    directKafkaStream.foreachRDD(rdd ->{

        rdd.foreach(avroRecord -> {

            byte[] encodedAvroData = avroRecord._2;
            LocationType t = deserialize(encodedAvroData);

            MongoClientOptions.Builder options_builder = new MongoClientOptions.Builder();
            options_builder.maxConnectionIdleTime(60000);
            MongoClientOptions options = options_builder.build();
            MongoClient mongo = new MongoClient ("localhost:27017", options);

            MongoDatabase database = mongo.getDatabase("DB");
            MongoCollection<Document> collection = database.getCollection("collection");

            Document myDoc = collection.find(eq("key", 4)).first();
            System.out.println(myDoc);

        });
    });

1 个答案:

答案 0 :(得分:0)

首先,您不应为每个记录打开mongo连接!然后你应该关闭你的mongo连接。

Mongo不喜欢你打开很多(数百,数千?)而不关闭它们。

以下是您可以通过RDD打开mongo连接的示例:

directKafkaStream.foreachRDD(rdd ->{
    rdd.foreachPartition(it -> {

        // Opens only 1 connection per partition
        MongoClient mongo = new MongoClient ("localhost:27017");
        MongoDatabase database = mongo.getDatabase("DB");
        MongoCollection<Document> collection = database.getCollection("collection");

        while (it.hasNext()) {
            byte[] encodedAvroData = it.next()._2;
            LocationType t = deserialize(encodedAvroData);

            Document myDoc = collection.find(eq("key", 4)).first();
            System.out.println(myDoc);
        }

        mongo.close();
    });
});