为什么scala.util.matching.Regex'显然' Scala提取器失败了吗?

时间:2018-01-17 15:23:04

标签: java regex scala pattern-matching data-extraction

我使用Scala提取器(即:regex in a pattern mathing)以识别双打和长号,如下所示。

我的问题是:为什么Regex在模式匹配中显然失败,而在if / then / else表达式链中使用时,它显然能够提供预期的结果?

val LONG   = """^(0|-?[1-9][0-9]*)$"""
val DOUBLE = """NaN|^-?(0(\.[0-9]*)?|([1-9][0-9]*\.[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?$"""

val scalaLONG   : scala.util.matching.Regex = LONG.r
val scalaDOUBLE : scala.util.matching.Regex = DOUBLE.r

val types1 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
    text match {
      case scalaLONG(long)     => s"Long"
      case scalaDOUBLE(double) => s"Double"
      case _                   => s"String"
    })
// Results types1: Seq[String] = List("String", "Long", "String", "String", "String")

val types2 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
    if(scalaDOUBLE.findFirstIn(text).isDefined) "Double" else
    if(scalaLONG  .findFirstIn(text).isDefined) "Long"   else    
    "String")
// Results types2: Seq[String] = List("String", "Long", "Double", "Double", "Double")

从上面可以看出,types2提供了预期的结果,而types1告诉"字符串"当" Double"预计,显然会指出正则表达式处理失败。

编辑:在@ alex-savitsky和@ leo-c的帮助下,我已经到达下面显示的内容,它按预期工作。但是,我必须 记住 在模式匹配中提供一个空参数列表,否则会给出错误的结果。这对我来说似乎容易出错

val LONG   = """^(?:0|-?[1-9][0-9]*)$"""
val DOUBLE = """^NaN|-?(?:0(?:\.[0-9]*)?|(?:[1-9][0-9]*\.[0-9]*)|(?:\.[0-9]+))(?:[Ee][+-]?[0-9]+)?$"""

val scalaLONG   : scala.util.matching.Regex = LONG.r
val scalaDOUBLE : scala.util.matching.Regex = DOUBLE.r

val types1 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
    text match {
      case scalaLONG()     => s"Long"
      case scalaDOUBLE()   => s"Double"
      case _               => s"String"
    })
// Results types1: Seq[String] = List("String", "Long", "Double", "Double", "Double")

val types2 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
    if(scalaDOUBLE.findFirstIn(text).isDefined) "Double" else
    if(scalaLONG  .findFirstIn(text).isDefined) "Long"   else    
    "String")
// Results types2: Seq[String] = List("String", "Long", "Double", "Double", "Double")

编辑:好的...尽管容易出错 ...它是一个提取器模式,在幕后使用unapply,在这种情况下,我们必须将参数传递给unnapply。 @ alex-savitsky在他的编辑中使用_*,明确强制删除所有捕获组的意图。对我来说很好看。

2 个答案:

答案 0 :(得分:1)

match匹配整个输入,而findFirstIn可以匹配部分输入内容,有时会导致更多匹配。实际上,findFirstIn将完全忽略您的边界标记^$

如果您打算匹配整个输入,请将^放在正则表达式的开头,与val DOUBLE = """^NaN|-?(0(\.[0-9]*)?|([1-9][0-9]*\.[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?$"""一样,然后types1将正确匹配类型。

编辑:这是我的问题的测试用例

object Test extends App {
    val regex = """^NaN|-?(?:0(?:\.[0-9]*)?|(?:[1-9][0-9]*\.[0-9]*)|(?:\.[0-9]+))(?:[Ee][+-]?[0-9]+)?$""".r
    println(Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map {
        case regex() => "Double"
        case _ => "String"
    })
}

结果为List(String, String, Double, Double, Double)

如您所见,非捕获组会发挥重要作用。

如果您仍想使用捕获组,可以使用_*忽略捕获结果:

object Test extends App {
    val regex = """^NaN|-?(0(\.[0-9]*)?|([1-9][0-9]*\.[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?$""".r
    println(Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map {
        case regex(_*) => "Double"
        case _ => "String"
    })
}

答案 1 :(得分:1)

由于您在scalaDOUBLE中定义了多个捕获组,因此您需要在相应的匹配大小写中提供匹配数量的参数,如下所示:

val types1 = Seq("abc", "3", "3.0", "-3.0E-05", "NaN").map(text =>
  text match {
    case scalaLONG(long)                 => s"Long"
    case scalaDOUBLE(d1, d2, d3, d4, d5) => s"Double"
    case _                               => s"String"
  })
// types1: Seq[String] = List(String, Long, Double, Double, Double)

您可以检查捕获的组,如下所示:

"-3.0E-05" match { case scalaDOUBLE(d1, d2, d3, d4, d5) => (d1, d2, d3, d4, d5) }
// res1: (String, String, String, String, String) = (3.0,null,3.0,null,E-05)