消费者无法使用来自生产者的所有消息

时间:2020-06-01 08:25:16

标签: python docker apache-kafka

我创建了3个单独的容器:1个用于Kafka,2个用于Porducer(Streamer),最后一个用于使用docker-compose的Consumer

    version: "3"

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    networks:
      - stream-network
  kafka:
    image: confluentinc/cp-kafka:latest
    depends_on:
      - zookeeper
    ports:
      - 9092:9092
    environment:
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
    networks: 
        - stream-network
  streamer:
    build:
      context: ./streamingProducer/
    networks: 
      - stream-network
    depends_on:
      - kafka
  consumer:
    build:
      context: ./streamingConsumer/
    networks: 
      - stream-network
    depends_on:
      - kafka

我正在容器中的生产者生成10条消息,下面是代码:

    from confluent_kafka import Producer
import pprint
from faker import Faker
#from bson.json_util import dumps
import time


def delivery_report(err, msg):
    """ Called once for each message produced to indicate delivery result.
        Triggered by poll() or flush(). """
    if err is not None:
        print('Message delivery failed: {}'.format(err))
    else:
        print('Message delivered to {} [{}]'.format(msg.topic(), msg.partition()))


# Generating fake data

myFactory = Faker()
myFactory.random.seed(5467)

for i in range(10):

    data = myFactory.name()
    print("data: ", data)

    # Produce sample message from localhost
    # producer = KafkaProducer(bootstrap_servers=['localhost:9092'], retries=5)
    # Produce message from docker
    producer = Producer({'bootstrap.servers': 'kafka:29092'})

    producer.poll(0)

    #producer.send('live-transactions', dumps(data).encode('utf-8'))
    producer.produce('mytopic', data.encode('utf-8'))

    # block until all async messages are sent
producer.flush()
    # tidy up the producer connection
    # producer.close()
time.sleep(0.5)

这是以下10条输出消息:

**

streamer_1   | producer.py:35: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
streamer_1   |   producer.produce('mytopic', data.encode('utf-8'))
streamer_1   | data:  Denise Reed
streamer_1   | data:  Megan Douglas
streamer_1   | data:  Philip Obrien
streamer_1   | data:  William Howell
streamer_1   | data:  Michael Williamson
streamer_1   | data:  Cheryl Jackson
streamer_1   | data:  Janet Bruce
streamer_1   | data:  Colton Martin
streamer_1   | data:  David Melton
streamer_1   | data:  Paula Ingram

**

当我尝试按消费者使用消息时,它仅消耗最后一条消息,在这种情况下: Paula Ingram ,然后程序像无限循环一样永远运行。不知道怎么了。这是使用者的以下代码:

from kafka.consumer import KafkaConsumer
try:
    print('Welcome to parse engine')
    # From inside a container
    #consumer = KafkaConsumer('test-topic', bootstrap_servers='kafka:29092')
    # From localhost
    consumer = KafkaConsumer('mytopic', bootstrap_servers='localhost:9092', auto_offset_reset='earliest')
    for message in consumer:
        print('im a message')
        print(message.value.decode("utf-8"))

except Exception as e:
    print(e)
    # Logs the error appropriately. 
    pass

任何帮助将不胜感激。谢谢。

1 个答案:

答案 0 :(得分:0)

我怀疑您遇到消费者群体问题。

auto_offset_reset='earliest' 仅适用于不存在的组。如果存在一个组,则该组将从上一个可用位置恢复。


如果不是这种情况,则不清楚您以什么顺序运行消费者和生产者,但是我会先启动消费者,然后docker-compose restart streamer几次