火花降低了

时间:2016-12-29 09:05:33

标签: apache-spark rdd reduce

我的流数据如下

id, date, value
i1, 12-01-2016, 10
i2, 12-02-2016, 20
i1, 12-01-2016, 30
i2, 12-05-2016, 40

希望按ID减少以按日期获取汇总值信息,如

rdd所需的输出是针对给定的ID和列表(第365天) 我必须根据一年中的日期将值放在列表位置,例如12-01-2016是336,因为设备i1有两个实例,它们应该聚合相同的日期

id, List [0|1|2|3|...              |336|  337|  |340|  |365]
i1,                                |10+30|        - this goes to 336 position

i2,                                       20     40 -- this goes to 337 and 340 position

请指导reduce或group by transformation。

2 个答案:

答案 0 :(得分:0)

我将为您提供基本的代码段,但您没有明确说明语言,数据源或数据格式。

JavaDStream<String> lineStream = //Your data source for stream
JavaPairDStream<String, Long> firstReduce = lineStream.mapToPair(line -> {
    String[] fields = line.split(",");
    String idDate = fields[0] + fields[1];
    Long value = Long.valueOf(fields[2]);
    return new Tuple2<String, Long>(idDate, value);
}).reduceByKey((v1, v2) -> {
    return (v1+v2);
});
firstReduce.map(idDateValueTuple -> {
    String idDate = idDateValueTuple._1();
    Long valueSum = idDateValueTuple._2();
    String id = idDate.split(",")[0];
    String date = idDate.split(",")[];
    //TODO parse date and put the sumValue in array as you wish
}

答案 1 :(得分:0)

只能达到这个目的。我不确定如何在最后一步添加数组的每个元素。希望这会有所帮助!!!如果你得到最后一步或任何其他方式,请欣赏你在这里发布!!

def getDateDifference(dateStr:String):Int = {
val startDate = "01-01-2016" 
val formatter = DateTimeFormatter.ofPattern("MM-dd-yyyy")
val oldDate = LocalDate.parse(startDate, formatter)
val currentDate = dateStr
val newDate = LocalDate.parse(currentDate, formatter)
return newDate.toEpochDay().toInt - oldDate.toEpochDay().toInt
}
def getArray(numberofDays:Int,data:Int):Iterable[Int] = {
val daysArray = new Array[Int](366)
daysArray(numberofDays) = data
return daysArray
}
val idRDD = <read from stream>
val idRDDMap = idRDD.map { rec => ((rec.split(",")(0),rec.split(",")(1)),
        (getDateDifference(rec.split(",")(1)),rec.split(",")(2).toInt))}
val idRDDconsiceMap = idRDDMap.map { rec => (rec._1._1,getArray(rec._2._1, rec._2._2)) }
val finalRDD = idRDDconsiceMap.reduceByKey((acc,value)=>(???add each element of the arrays????))