接收数据,我计算温度值的平均值。当我在~1 min
,因此大约有59 min
此处控制台输出代码段(在停止发送avg = sum / count
PairDStream<deviceId, [value]>
Time: 1472801338000 ms
Time: 1472801338000 ms
Time: 1472801339000 ms
Time: 1472801339000 ms
Time: 1472801340000 ms
Time: 1472801340000 ms
Time: 1472801341000 ms
Time: 1472801341000 ms
Time: 1472801342000 ms
Time: 1472801342000 ms
Time: 1472801343000 ms
Time: 1472801343000 ms
Time: 1472801344000 ms
Time: 1472801344000 ms
编辑:使用 JavaReceiverInputDStream<String> ingoingStream = streamingContext.socketTextStream(serverIp, 11833);
// 2. Map the DStream<String> to a DStream<SensorData> by deserializing JSON objects
JavaDStream<SensorData> sensorDStream = ingoingStream.map(new Function<String, SensorData>() {
public SensorData call(String json) throws Exception {
ObjectMapper om = new ObjectMapper();
return (SensorData)om.readValue(json, SensorData.class);
/************************************************ MOVIING AVERAGE OF TEMPERATURE *******************************************************************/
// Collect the data to a window of time (this is the time period for average calculation, older data is removed from stream!)
JavaDStream<SensorData> windowMovingAverageSensorDataTemp = sensorDStream.window(windowSizeMovingAverageTemperature);
// Map this SensorData stream to a new PairDStream, with key = deviceId (so we can make calculations by grouping by the id)
// .cache the Stream, because we re-use it more than 1 time!
JavaPairDStream<String, SensorData> windowMovingAverageSensorDataTempPairDStream = windowMovingAverageSensorDataTemp.mapToPair(new PairFunction<SensorData, String, SensorData>() {
public Tuple2<String, SensorData> call(SensorData data) throws Exception {
return new Tuple2<String, SensorData>(data.getIdSensor(), data);
// a) Map the PairDStream from above to a new PairDStream of form <deviceID, temperature>
// b) Sum up the values to the total sum, grouped also by key (= device id)
// => combined these two transactions, could also be called separately (like above)
JavaPairDStream<String, Float> windowMovingAverageSensorDataTempPairDStreamSum = windowMovingAverageSensorDataTempPairDStream.mapToPair(new PairFunction<Tuple2<String,SensorData>, String, Float>() {
public Tuple2<String, Float> call(Tuple2<String, SensorData> sensorDataPair) throws Exception {
String key = sensorDataPair._1();
Float value = sensorDataPair._2().getValTemp();
return new Tuple2<String, Float>(key, value);
}).reduceByKey(new Function2<Float, Float, Float>() {
public Float call(Float sumA, Float sumB) throws Exception {
return sumA + sumB;
// a) Map the PairDStream from above to a new PairDStream of form <deviceID, 1L> to prepare the counting (1 = 1 entry)
// b) Sum up the values to the total count of entries, grouped by key (= device id)
// => also combined both calls
JavaPairDStream<String, Long> windowMovingAverageSensorDataTempPairDStreamCount = windowMovingAverageSensorDataTempPairDStream.mapToPair(new PairFunction<Tuple2<String,SensorData>, String, Long>() {
public Tuple2<String, Long> call(Tuple2<String, SensorData> sensorDataPair) throws Exception {
String key = sensorDataPair._1();
Long value = 1L;
return new Tuple2<String, Long>(key, value);
}).reduceByKey(new Function2<Long, Long, Long>() {
public Long call(Long countA, Long countB) throws Exception {
return countA + countB;
// Make a join of the sum and count Streams, so this puts together data with same keys (device id)
// This results in a new PairDStream of <deviceID, <sumOfTemp, countOfEntries>>
JavaPairDStream<String, Tuple2<Float, Long>> windowedTempJoinPairDStream = windowMovingAverageSensorDataTempPairDStreamSum.join(windowMovingAverageSensorDataTempPairDStreamCount).cache();
// Calculate the average temperature by avg = sumOfTemp / countOfEntries, do this for each key (device id)
JavaPairDStream<String, Float> windowedTempAvg = windowedTempJoinPairDStream.mapToPair(new PairFunction<Tuple2<String,Tuple2<Float,Long>>, String, Float>() {
public Tuple2<String, Float> call(Tuple2<String, Tuple2<Float, Long>> joinedData) throws Exception {
String key = joinedData._1();
float tempSum = joinedData._2()._1();
long count = joinedData._2()._2();
float avg = tempSum / (float)count;
return new Tuple2<String, Float>(key, avg);
// print the joined PairDStream from above to check sum & count visually
// print the final, calculated average values for each device id in form (deviceId, avgTemperature)
// ========================================================= START THE STREAM ============================================================
// Start streaming & listen until stream is closed
进行平均计算的Spark App:
Time: 1473077627000 ms
Time: 1473077628000 ms
Time: 1473077629000 ms
Time: 1473077630000 ms
Time: 1473077631000 ms
Time: 1473077632000 ms
Time: 1473077633000 ms
答案 0 :(得分:1)
至少乍一看这并不是特别奇怪。正如您已经建议的那样,这很可能是由于舍入错误造成的。由于FP算术既不是associative也不是可交换的,而且Spark shuffle是不确定的,我们可以预期结果会在不同的运行中波动。
来实现the Online algorithm的变体,它具有更好的数值属性。BigDecimal