仅数字的Scala函数过滤字段

时间:2015-06-05 13:29:04

标签: scala apache-spark

我有以下功能:

  def isAllDigits(x: String) = x forall Character.isDigit

  def filterNum(x: (Int, String)) : Boolean = {
    accumNum.add(1)
    if(isAllDigits(x._2)) false
    else true
  }

我传入键/值,我想检查值是否为数字。出于某种原因,它正在过滤掉:

res10: Array[(Int, String)] = Array((1,18964), (2,39612), (3,1), (4,""), (5,""), (6,""), (7,""), (8,""), (9,1), (10,""))

但允许这样:

res9: Array[(Int, String)] = Array((18,1000.0), (22,23.99), (18,1001.0), (22,23.99), (18,300.0), (22,23.99), (18,300.0), (22,23.99), (18,300.0), (22,23.99))

.isDigit只允许双打吗?但我很困惑为什么当x是(Int,String)时传入的double / int被视为一个字符串。

编辑: 我在Spark中使用此函数具有以下内容:

val numFilterRDD = numRDD.filter(filterNum)

numRDD.take()示例:

res11: Array[(Int, String)] = Array((1,18964), (2,39612), (3,1), (4,""), (5,""), (6,""), (7,""), (8,""), (9,1), (10,""), (11,""), (16,""), (18,1000.0), (19,""), (20,""), (21,""), (22,23.99), (23,""), (24,""), (25,""))

1 个答案:

答案 0 :(得分:1)

问题是你是分别运行每个角色。因此,在double的情况下,它会检查小数,并且它本身不是数字:

Character.isDigit('.') //false

您最好使用正则表达式。

x matches """^\d+(\.?\d+)$"""