我的流数据如下
id, date, value
i1, 12-01-2016, 10
i2, 12-02-2016, 20
i1, 12-01-2016, 30
i2, 12-05-2016, 40
希望按ID减少以按日期获取汇总值信息,如
rdd所需的输出是针对给定的ID和列表(第365天) 我必须根据一年中的日期将值放在列表位置,例如12-01-2016是336,因为设备i1有两个实例,它们应该聚合相同的日期
id, List [0|1|2|3|... |336| 337| |340| |365]
i1, |10+30| - this goes to 336 position
i2, 20 40 -- this goes to 337 and 340 position
请指导reduce或group by transformation。
答案 0 :(得分:0)
我将为您提供基本的代码段,但您没有明确说明语言,数据源或数据格式。
JavaDStream<String> lineStream = //Your data source for stream
JavaPairDStream<String, Long> firstReduce = lineStream.mapToPair(line -> {
String[] fields = line.split(",");
String idDate = fields[0] + fields[1];
Long value = Long.valueOf(fields[2]);
return new Tuple2<String, Long>(idDate, value);
}).reduceByKey((v1, v2) -> {
return (v1+v2);
});
firstReduce.map(idDateValueTuple -> {
String idDate = idDateValueTuple._1();
Long valueSum = idDateValueTuple._2();
String id = idDate.split(",")[0];
String date = idDate.split(",")[];
//TODO parse date and put the sumValue in array as you wish
}
答案 1 :(得分:0)
只能达到这个目的。我不确定如何在最后一步添加数组的每个元素。希望这会有所帮助!!!如果你得到最后一步或任何其他方式,请欣赏你在这里发布!!
def getDateDifference(dateStr:String):Int = {
val startDate = "01-01-2016"
val formatter = DateTimeFormatter.ofPattern("MM-dd-yyyy")
val oldDate = LocalDate.parse(startDate, formatter)
val currentDate = dateStr
val newDate = LocalDate.parse(currentDate, formatter)
return newDate.toEpochDay().toInt - oldDate.toEpochDay().toInt
}
def getArray(numberofDays:Int,data:Int):Iterable[Int] = {
val daysArray = new Array[Int](366)
daysArray(numberofDays) = data
return daysArray
}
val idRDD = <read from stream>
val idRDDMap = idRDD.map { rec => ((rec.split(",")(0),rec.split(",")(1)),
(getDateDifference(rec.split(",")(1)),rec.split(",")(2).toInt))}
val idRDDconsiceMap = idRDDMap.map { rec => (rec._1._1,getArray(rec._2._1, rec._2._2)) }
val finalRDD = idRDDconsiceMap.reduceByKey((acc,value)=>(???add each element of the arrays????))