我有一个RDD
,RDD
类型是Tuple2(value,timestamp)
,值是1或0,时间戳是顺序,变量limitTime = 4。当我映射RDD
时,如果值为1,则从当前时间戳到(timestamp + limitTime)的输出值为1,否则当前值为0,我将其称为句点。但是有一种特殊情况,当值为1并且其时间戳在句点中时,则忽略它,输出的当前值为0
input : (0,0),(1,1),(0,3),(0,5),(0,7),(0,8),(0,10),(1,12),(0,14),(0,15)
expected output :(0,0),(1,1),(1,3),(1,5),(0,7),(0,8),(0,10),(1,12),(1,14),(1,15)
special input2: (0,0),(1,1),(0,3),(1,5),(0,7),(1,8),(0,10),(1,12),(0,14),(0,15)
expected output2:(0,0),(1,1),(1,3),(1,5),(0,7),(1,8),(1,10),(1,12),(0,14),(0,15)
this is my try:
var limitTime=4
var startTime= -limitTime
val rdd=sc.parallelize(List((0,0),(1,1),(0,3),(1,5),(0,7),(1,8),(0,10),(1,12),(0,14),(0,15)),4)
val results=rdd.mapPartitions(parIter => {
var resultIter = new ArrayBuffer[Tuple2[Int,Int]]()
while (parIter.hasNext) {
val iter = parIter.next()
if(iter._1==1){
if(iter._2<=startTime+limitTime&&iter._2!=0&&iter._2>=startTime){
resultIter.append(iter)
}else{
resultIter.append(iter)
startTime=iter._2
}
}else{
if(iter._2<=startTime+limitTime&&iter._2!=0&&iter._2>=startTime){
resultIter.append((1,iter._2))
}else{
resultIter.append(iter)
}
}
}
resultIter.toIterator
})
results.collect().foreach(println)
答案 0 :(得分:0)
Following code should work for both of your cases.
var limitTime=3
var first = true
var previousValue = 0
val rdd=sc.parallelize(List((0,0),(1,1),(0,3),(0,5),(0,7),(0,8),(0,10),(1,12),(0,14),(0,15)), 4)
val tempResult = rdd.collect.map(pair => {
if(first){
first = false
previousValue = pair._1
(pair._1, pair._2)
}
else {
if ((pair._1 == 1 || previousValue == 1) && limitTime > 0) {
limitTime -= 1
previousValue = 1
(1, pair._2)
}
else {
if (limitTime == 0) limitTime = 3
previousValue = pair._1
(pair._1, pair._2)
}
}
})
tempResult.foreach(print)
If it doesn't please let me know