我有一个RDD,它与此
密切相关1.0,2.0,0.0019,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 1.0,3.0,0.0,3.0E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 1.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 1.0,5.0,-0.0019,-2.0E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.4294 1.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 1.0,7.0,0.0,1.0E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 1.0,8.0,0.0,3.0E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9040.8,0.0,0.0,0.0,0.0,0.0,0.0 1.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 1.0,10.0,-0.0033,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,47.03,0.0,0.0,0.0,0.0 1.0,11.0,0.0,-3.0E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,554.54,0.0,0.0,0.0,0.0,0.0,0.0,8140.58,0.0
我需要过滤零的数量等于特定数字的行,比如说15.过滤器方法的这个定义过滤的行数比预期的多。
def filterZeroRowsWReadings(row: Array[String]) = {
var flag:Int = 0
for(value <- row) {
if(value.toDouble == 0.0)
flag = flag + 1
}
flag match {
case 15 => false
case _ => true
}
}
但是我已经在我的RDD子集中手动计算了零的行数到3,834,但上面的过滤方法正在删除3,960行。现在,我不明白这126行会在哪里?有没有办法让我知道发生了什么。在较小的RDD上,结果如预期的那样,但在大型RDD上,它在某种程度上是意料之外的。
感谢。
答案 0 :(得分:1)
也许这是一个精确的问题?您可以尝试将每个值作为字符串与“0.0”进行比较,看看是否会发生任何变化。