无法使用Kafka-Python的解串器从Kafka获取JSON消息

时间:2017-04-07 16:21:44

标签: python json kafka-python

我正在尝试通过Kafka发送一个非常简单的JSON对象,并使用Python和kafka-python将其读出来。但是,我一直看到以下错误:

  while True:
        json_obj1 = json.dumps({"dataObjectID": "test1"})
        print json_obj1
        producer.send('my-topic', {"dataObjectID": "test1"})
        producer.send('my-topic', {"dataObjectID": "test2"})
        time.sleep(1)

我做了一些研究,这个错误最常见的原因是JSON错了。我在发送之前尝试打印出JSON,方法是将以下内容添加到我的代码中,并且JSON打印没有错误。

import threading
import logging
import time
import json

from kafka import KafkaConsumer, KafkaProducer


class Producer(threading.Thread):
    daemon = True

    def run(self):
        producer = KafkaProducer(bootstrap_servers='localhost:9092',
                                 value_serializer=lambda v: json.dumps(v).encode('utf-8'))

        while True:
            producer.send('my-topic', {"dataObjectID": "test1"})
            producer.send('my-topic', {"dataObjectID": "test2"})
            time.sleep(1)


class Consumer(threading.Thread):
    daemon = True

    def run(self):
        consumer = KafkaConsumer(bootstrap_servers='localhost:9092',
                                 auto_offset_reset='earliest',
                                 value_deserializer=lambda m: json.loads(m).decode('utf-8'))
        consumer.subscribe(['my-topic'])

        for message in consumer:
            print (message)


def main():
    threads = [
        Producer(),
        Consumer()
    ]

    for t in threads:
        t.start()

    time.sleep(10)

if __name__ == "__main__":
    logging.basicConfig(
        format='%(asctime)s.%(msecs)s:%(name)s:%(thread)d:' +
               '%(levelname)s:%(process)d:%(message)s',
        level=logging.INFO
    )
    main()

这让我怀疑我可以制作json,但不会消耗它。

这是我的代码:

ConsumerRecord(topic=u'my-topic', partition=0, offset=5742, timestamp=None, timestamp_type=None, key=None, value='{"dataObjectID": "test1"}', checksum=-1301891455, serialized_key_size=-1, serialized_value_size=25)
ConsumerRecord(topic=u'my-topic', partition=0, offset=5743, timestamp=None, timestamp_type=None, key=None, value='{"dataObjectID": "test2"}', checksum=-1340077864, serialized_key_size=-1, serialized_value_size=25)
ConsumerRecord(topic=u'my-topic', partition=0, offset=5744, timestamp=None, timestamp_type=None, key=None, value='test', checksum=1495943047, serialized_key_size=-1, serialized_value_size=4)
ConsumerRecord(topic=u'my-topic', partition=0, offset=5745, timestamp=None, timestamp_type=None, key=None, value='\xc2Hello, stranger!', checksum=-1090450220, serialized_key_size=-1, serialized_value_size=17)
ConsumerRecord(topic=u'my-topic', partition=0, offset=5746, timestamp=None, timestamp_type=None, key=None, value='test', checksum=1495943047, serialized_key_size=-1, serialized_value_size=4)
ConsumerRecord(topic=u'my-topic', partition=0, offset=5747, timestamp=None, timestamp_type=None, key=None, value='\xc2Hello, stranger!', checksum=-1090450220, serialized_key_size=-1, serialized_value_size=17)

如果删除value_serializer和value_deserializer,我可以成功发送和接收字符串。当我运行该代码时,我可以看到我发送的JSON。这是一个简短的片段:

{{1}}

所以我尝试从使用者中删除value_deserializer,并且该代码执行但没有反序列化器,消息以String形式出现,这不是我需要的。那么,为什么value_deserializer不起作用呢?是否有不同的方法从我应该使用的Kafka消息中获取JSON?

3 个答案:

答案 0 :(得分:4)

首先将消息解码为utf-8,然后json.load / dump it解决了我的问题:

value_deserializer=lambda m: json.loads(m.decode('utf-8'))

而不是:

value_deserializer=lambda m: json.loads(m).decode('utf-8')

希望这也适用于制作人的一方

答案 1 :(得分:2)

事实证明问题是value_deserializer=lambda m: json.loads(m).decode('utf-8')的解码部分,当我将其更改为value_deserializer=lambda m: json.loads(m)时,我发现从Kafka读取的对象类型现在是字典。基于python的JSON文档中的以下信息是正确的:

|---------------------|------------------|
|       JSON          |     Python       |
|---------------------|------------------|
|      object         |      dict        |
|---------------------|------------------|
|      array          |      list        |
|---------------------|------------------|
|      string         |      unicode     |
|---------------------|------------------|
|      number (int)   |      int, long   |
|---------------------|------------------|
|      number (real)  |      float       |
|---------------------|------------------|
|      true           |      True        |
|---------------------|------------------|
|      false          |      False       |
|---------------------|------------------|
|      null           |      None        |
|---------------------|------------------|

答案 2 :(得分:1)

您不需要lambda ...代替

value_deserializer=lambda m: json.loads(m)

您应该使用

value_deserializer=json.load