我正在使用kafka Confluent 4.0.0将数据从SQL Server提取到kafka主题。
我想使用以下程序从火花流程序中读取存储在kafka中的主题数据:
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
from confluent_kafka.avro.cached_schema_registry_client import CachedSchemaRegistryClient
from confluent_kafka.avro.serializer.message_serializer import MessageSerializer
schema_registry_client = CachedSchemaRegistryClient(url='http://schemaregistryhostip:8081')
serializer = MessageSerializer(schema_registry_client)
ssc = StreamingContext(sc, 30)
kvs = KafkaUtils.createDirectStream(ssc, ["topic name"], {"metadata.broker.list": "brokerip:9092"}, valueDecoder=serializer.decode_message)
lines = kvs.map(lambda x: x[1])
lines.pprint()
ssc.start()
ssc.awaitTermination()
但是当我使用ssc.start()方法启动spark streaming时,我的误差低于此值。
ImportError: No module named confluent_kafka.avro.serializer