使用Spark流处理数据流并将其存储在mongoDB中时遇到问题;场景如下:有一个发布者发送一些数据(例如,机器人的轮子的角度和行进的距离),还有一个消费者,通过kafka消费了这些数据,并通过Spark流技术对其进行处理(计算坐标在XY平面中),然后将其存储在mongoDB中;特别是,问题如下: 尽管该消息仅消耗一次,并且直接流中只有一个RDD,但该消息却被处理了3次,因此有3次更新而不是1次。 仅当我将数据存储在mongoDB上时才会发生这种情况,相反,如果我仅对pprint()进行“详细说明”,则不会发生这种情况。 现在我显示代码:
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
import math
import time
from pyspark import SparkConf
import pymongo_spark
# Important: activate pymongo_spark.
pymongo_spark.activate()
startInformation = {'robot id':'r','x coordinate':'0','y coordinate':'0','speed':'0','delta space':'s','theta twist':'t','timeStamp':'ts'}
oldX = ''
oldY = ''
# Create a local StreamingContext with two working thread and batch interval of 3 second
sc = SparkContext("local[2]", "OdometryConsumer")
ssc = StreamingContext(sc, 3)
kafkaStream = KafkaUtils.createDirectStream(ssc, ['odometry'], {'metadata.broker.list': 'localhost:9092'})
def getPositionSpeed(line):
fr = open('/home/erca/Scrivania/proveTesi/info.txt', 'r')
for l in fr.readlines():
oldX = float(l.split(' ')[0])
oldY = float(l.split(' ')[1])
try:
oldTs = int(l.split(' ')[2])
except:
oldTs = int(time.time())
fr.close()
fields = line[1].split(" ")
robotId = fields[0].split(":")[1]
deltaSpace = float(fields[1].split(":")[1])
thetaTwist = float(fields[2].split(":")[1])
ts = int(fields[3].split(":")[1])
newX = oldX + deltaSpace*(math.cos(thetaTwist))
newY = oldY + deltaSpace*(math.sin(thetaTwist))
print("******************************** vecchio ts: " + str(oldTs))
print("******************************** nuovo ts: " + str(ts))
print("******************************** spazio: " + str(deltaSpace))
print("******************************** angolo: " + str(thetaTwist))
try:
speed = (float(deltaSpace))/(float(ts - oldTs))
except Exception as e:
speed = str(e)
#speed = float(9999999999999)
fw = open('/home/erca/Scrivania/proveTesi/info.txt', 'w')
fw.write(str(newX) + " " + str(newY) + " " + str(ts))
fw.close()
startInformation['robot id'] = robotId
startInformation['x coordinate'] = newX
startInformation['y coordinate'] = newY
startInformation['speed'] = speed
startInformation['delta space'] = deltaSpace
startInformation['theta twist'] = thetaTwist
startInformation['timeStamp'] = ts
print("-------------------------------" + str(startInformation) + "-------------------------------")
return startInformation
elaborate = kafkaStream.map(getPositionSpeed)
#elaborate.pprint()
def sendRecord(rdd):
try:
#rdd.pprint()
rdd.saveToMongoDB('mongodb://localhost:27017/marco.odometry')
except:
pass
elaborate.foreachRDD(sendRecord)
ssc.start() # Start the computation
ssc.awaitTermination() # Wait for the computation to terminate
谁可以帮助我?谢谢