生产者/消费者Kafka Spark Streaming

时间:2018-07-10 11:18:30

标签: pyspark apache-kafka spark-streaming apache-kafka-connect

我正在尝试使用Kafka和Spark Streaming编写生产者和消费者的代码(python)。 有一家涉及里程表的Json格式的随机消息的产生者,它使用线程每隔3秒发布一次有关主题的消息:

from kafka import KafkaProducer
from kafka.errors import KafkaError import threading
from random import randint import random
import json
import math

def sendMessage():

#the function is called every 3 seconds, then a message is sent every 3 seconds
threading.Timer(3.0, sendMessage).start()

#connection with message broker
producer = KafkaProducer(bootstrap_servers=['localhost:9092'], value_serializer=lambda m: json.dumps(m).encode('ascii'))    

#the id is initially fixed to 1, but there could be more robots
robotId = 1
#generation of random int
deltaSpace = randint(1, 9) #.encode()
thetaTwist = random.uniform(0, math.pi*2) #.encode()


future = producer.send('odometry', key=b'message', value={'robotId': robotId, 'deltaSpace': deltaSpace, 'thetaTwist': thetaTwist}).add_callback(on_send_success).add_errback(on_send_error)

# Block for 'synchronous' sends
try:
    record_metadata = future.get(timeout=10)
except KafkaError:
# Decide what to do if produce request failed...
    log.exception()
    pass

producer.flush()

def on_send_success(record_metadata):
print ("topic name: " + record_metadata.topic)
print ("number of partitions: " + str(record_metadata.partition))
print ("offset: " + str(record_metadata.offset))

def on_send_error(excp):
log.error('I am an errback', exc_info=excp)
# handle exception

sendMessage()

然后有一个使用者,该使用者每3秒就同一主题使用一次消息,并使用Spark Streaming对其进行处理;这是代码:

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
import json 

# Create a local StreamingContext with two working thread and batch interval of 3 second
sc = SparkContext("local[2]", "OdometryConsumer")
ssc = StreamingContext(sc, 3)

kafkaStream = KafkaUtils.createDirectStream(ssc, ['odometry'], {'metadata.broker.list': 'localhost:9092'})
parsed = kafkaStream.map(lambda v: json.loads(v))

def f(x): print(x)

fore = parsed.foreachRDD(f)

ssc.start()
ssc.awaitTermination()

要运行该应用程序,请在端口2181上启动Zookeeper服务器

sudo /opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties

然后我在端口9092上启动Kafka的服务器/代理

sudo /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties

然后我启动生产者和消费者

python3 Producer.py
./spark-submit --jars spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/en1gma/SparkConsumer.py

该应用程序运行时没有错误,但是我不确定消息是否真的被占用了。我该怎么做才能验证?

谢谢。

0 个答案:

没有答案