我有一个Java应用程序来处理Kafka的avro消息流,并且每个消息都对mongoDB集合执行查询。
在正确处理了几十条消息之后,应用程序停止运行并抛出" com.mongodb.MongoSocketReadException:过早地到达流的末尾"。
以下是代码:
JavaPairInputDStream<String, byte[]> directKafkaStream = KafkaUtils.createDirectStream(jsc,
String.class, byte[].class, StringDecoder.class, DefaultDecoder.class, kafkaParams, topics);
directKafkaStream.foreachRDD(rdd ->{
rdd.foreach(avroRecord -> {
byte[] encodedAvroData = avroRecord._2;
LocationType t = deserialize(encodedAvroData);
MongoClientOptions.Builder options_builder = new MongoClientOptions.Builder();
options_builder.maxConnectionIdleTime(60000);
MongoClientOptions options = options_builder.build();
MongoClient mongo = new MongoClient ("localhost:27017", options);
MongoDatabase database = mongo.getDatabase("DB");
MongoCollection<Document> collection = database.getCollection("collection");
Document myDoc = collection.find(eq("key", 4)).first();
System.out.println(myDoc);
});
});
答案 0 :(得分:0)
首先,您不应为每个记录打开mongo连接!然后你应该关闭你的mongo连接。
Mongo不喜欢你打开很多(数百,数千?)而不关闭它们。
以下是您可以通过RDD打开mongo连接的示例:
directKafkaStream.foreachRDD(rdd ->{
rdd.foreachPartition(it -> {
// Opens only 1 connection per partition
MongoClient mongo = new MongoClient ("localhost:27017");
MongoDatabase database = mongo.getDatabase("DB");
MongoCollection<Document> collection = database.getCollection("collection");
while (it.hasNext()) {
byte[] encodedAvroData = it.next()._2;
LocationType t = deserialize(encodedAvroData);
Document myDoc = collection.find(eq("key", 4)).first();
System.out.println(myDoc);
}
mongo.close();
});
});