阅读从kafka到PySpark 2.2的avro消息

时间:2018-04-16 10:32:04

标签: python pyspark spark-streaming confluent-kafka confluent-schema-registry

我正在使用kafka Confluent 4.0.0将数据从SQL Server提取到kafka主题。

我想使用以下程序从火花流程序中读取存储在kafka中的主题数据:

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from confluent_kafka.avro.cached_schema_registry_client import CachedSchemaRegistryClient
from confluent_kafka.avro.serializer.message_serializer import MessageSerializer

schema_registry_client = CachedSchemaRegistryClient(url='http://schemaregistryhostip:8081')
serializer = MessageSerializer(schema_registry_client)
ssc = StreamingContext(sc, 30)
kvs = KafkaUtils.createDirectStream(ssc, ["topic name"], {"metadata.broker.list": "brokerip:9092"}, valueDecoder=serializer.decode_message)
lines = kvs.map(lambda x: x[1])
lines.pprint()
ssc.start()
ssc.awaitTermination()

但是当我使用ssc.start()方法启动spark streaming时,我的误差低于此值。

ImportError: No module named confluent_kafka.avro.serializer

0 个答案:

没有答案