如何使用Scala正则表达式解析一行文本?

时间:2014-08-26 16:37:46

标签: regex scala

我正在使用Scala处理数据丰富的文本行,其中的一个示例是:

0101 Test        A123456-7 N   Ag Ri              R 123 Im K8 V

为了解析这个问题,我已经移植了我在其他语言中使用的正则表达式。但是,我做错了什么。我的错误对象是:

object UwpParser extends App
{
   val Pattern = "^(\\d\\d\\d\\d) (\\S.+) ([ABCDEX]\\d\\d\\d\\d\\d\\d-\\d) (..)\\s*(\\w.{17}) (.) (\\d\\d\\d) (\\w\\w) (.*)$".r;

   var data = scala.io.Source.fromFile( "test.txt" ).getLines.mkString;

   for (p <- Pattern findAllIn data) p match
   {
      case Pattern(c) => println( c )
      case _ => None
   }
}

for block的目的只是为了查看我是否已捕获了我的数据。显然我没有。我确定我做了很多错事。我已经搜索过堆栈溢出,但问题似乎与此不同,或者有一些我没有得到的东西。

更新即可。感谢发布scaladoc参考的人!我更正的代码是:

object UwpParser extends App
{
   val Pattern = """^(\d\d\d\d) (\S.+) ([ABCDEX]\d\d\d\d\d\d-\d) (..)\s*(\w.{17}) (.) (\d\d\d) (\w\w) (.*)$""".r;

   var data = scala.io.Source.fromFile( "test.txt" ).getLines.mkString;  

   data match {
      case Pattern(hex, name, uwp, bases, codes, zone, pbg, alleg, stellar) => println( s"$name ($hex) $uwp" );
   }
}

1 个答案:

答案 0 :(得分:2)

最近一夜有澄清的scaladoc:

http://www.scala-lang.org/files/archive/nightly/2.11.x/api/2.11.x/#scala.util.matching.Regex

有很多模式匹配中捕获组的例子。

我希望这个版本的文档更容易阅读。

另外,您打算不要将{regex]与data的每一行匹配?

val p = """your regex""".r
for (line <- text.getLines) {
  line match {
    case p(field1, field2, field3, _*) => // do something with first 3 capturing groups
  }
}

而不是粘合和解开输入。

只是为了好玩和完整:

scala> val text = "Now is the time\nfor all good men\nto come home for dinner."
text: String =
Now is the time
for all good men
to come home for dinner.

scala> val r = """(?m)^(\S+)\s*(.*)$""".r
r: scala.util.matching.UnanchoredRegex = (?m)^(\S+)\s*(.*)$

scala> r findAllMatchIn text map (_ group 1) toList
warning: there was one feature warning; re-run with -feature for details
res0: List[String] = List(Now, for, to)

scala> r findAllMatchIn text map { case r(first, rest) => s"$first! ($rest)" } toList
warning: there was one feature warning; re-run with -feature for details
res1: List[String] = List(Now! (is the time), for! (all good men), to! (come home for dinner.))

实际上,这是为了提醒自己内联标志是什么。这是多线的m