Kafka消费者缺失的记录

时间:2014-11-06 06:06:27

标签: apache-kafka spark-streaming

Kafka和Spark-Streaming之间存在问题,我在生产中有低级别流量(大约12000-15000条记录/秒)服务,起初,消耗流量似乎正常,但是在10-15分钟后,突然消耗的速度差不多剩下1/10了。这可能是网络的流量问题?

Kafka的配置:
    num.network.threads = 2
    num.io.threads = 8
    socket.send.buffer.bytes = 1048576
    socket.receive.buffer.bytes = 1048576
    socket.request.max.bytes = 104857600
    log.flush.interval.messages = 10000
    log.flush.interval.ms = 1000
    log.retention.hours = 12
    log.segment.bytes = 536870912
    log.retention.check.interval.ms = 60000
    log.cleaner.enable =假
    log.cleanup.interval.mins = 1

spark-streaming(消费者)的配置:

....
val kafkaParams = Map(
    "zookeeper.connect" -> zkQuorum,
    "group.id" -> group,
    "zookeeper.connection.timeout.ms" -> "1000000",
    "zookeeper.sync.time.ms" -> "200",
    "fetch.message.max.bytes" -> "2097152000",
    "queued.max.message.chunks" -> "1000",
    "auto.commit.enable" -> "true",
    "auto.commit.interval.ms" -> "1000")

try {
    KafkaUtils.createStream[String, String, StringDecoder, StringDecoder](
      ssc, kafkaParams, topics.map((_, partition)).toMap,
      StorageLevel.MEMORY_ONLY).map {
      case (key, value) => convertTo(key, value)
    }.filter {
      _ != null
    }.foreachRDD(line => saveToHBase(line, INPUT_TABLE))
    //}.foreachRDD(line => logger.info("handling testing....."+ line))
  } catch {
    case e: Exception => logger.error("consumerEx: " + e.printStackTrace)
  }

1 个答案:

答案 0 :(得分:0)

可能是GC暂停时间。检查一下:http://ingest.tips/2015/01/21/handling-large-messages-kafka/