Scala正则表达式模式匹配

时间:2015-04-28 21:44:30

标签: regex scala

我需要使用正则表达式匹配Scala中的模式,我目前有一个正则表达式

InputPattern: scala.util.matching.Regex = put (.*) in (.*)

当我执行以下操作时会发生这种情况:

scala> val InputPattern(verb, item, prep, obj) = "put a in b";
scala.MatchError: put a in b (of class java.lang.String)
... 33 elided 

我希望最终输入verb("put"), item("a"), prep("in"), and obj("b")输入"输入b" verb("put"), item(""), prep("in"), and obj("")输入&# 34;放入"

由于

2 个答案:

答案 0 :(得分:1)

您可以为所有情况编写一个正则表达式,但我不确定它是否可读和可维护。我更喜欢简单的方法:

val pattern1 = "(put) (.*) (in) (.*)".r
val pattern2 = "(put) (in)".r
def parse(text: String) = text match { 
  case pattern1(verb, item, prep, obj) => (verb, item, prep, obj); 
  case pattern2(verb, prep) => (verb, "", prep, "") 
}
scala> parse("put a in b")
res6: (String, String, String, String) = (put,a,in,b)

scala> parse("put in")
res7: (String, String, String, String) = (put,"",in,"")

还有一个额外的想法:我希望你知道你在做什么! RegEx是Chomsky Type 3 grammar,自然语言要复杂得多。如果您需要自然语言解析器,则可以使用已有的解决方案,例如Stanford NLP parser

答案 1 :(得分:1)

这适用于您的特殊情况:

scala> val InputPattern = "(put) (.*?) ?(in) ?(.*?)".r
InputPattern: scala.util.matching.Regex = (put) (.*) ?(in) ?(.*)

scala> val InputPattern(verb, item, prep, obj) = "put a in b"
verb: String = put
item: String = a
prep: String = in
obj: String = b

scala> val InputPattern(verb, item, prep, obj) = "put in"
verb: String = put
item: String = ""
prep: String = in
obj: String = ""
此处的

putin也会在群组中捕获,以参与模式匹配。我还使用了懒惰的正则表达式(.*?)来尽可能少地捕获,您可以用(\S*)替换它。 ?为您提供可选的空间 “放入”(putin之间有一个空格,最后没有空格。)

但要注意这一点:

scala> val InputPattern(verb, item, prep, obj) = "put ainb"
verb: String = put
item: String = a
prep: String = in
obj: String = b

scala> val InputPattern(verb, item, prep, obj) = "put aininb"
verb: String = put
item: String = a
prep: String = in
obj: String = inb

scala> val InputPattern(verb, item, prep, obj) = "put ain"
verb: String = put
item: String = a
prep: String = in
obj: String = ""

如果你有简单的命令解释器,它甚至可能是好的,否则你应该单独匹配你的特殊情况。

要处理简单(非自然)语言,您还可以考虑StandardTokenParsers,因为它们是无上下文的(Chomsky type 2):

import scala.util.parsing.combinator.syntactical._

val p = new StandardTokenParsers {
   lexical.reserved ++= List("put", "in") 
   def p = "put" ~ opt(ident) ~ "in" ~ opt(ident)
}

scala> p.p(new p.lexical.Scanner("put a in b"))
warning: there was one feature warning; re-run with -feature for details
res13 = [1.11] parsed: (((put~Some(a))~in)~Some(b))

scala> p.p(new p.lexical.Scanner("put in"))
warning: there was one feature warning; re-run with -feature for details
res14 = [1.7] parsed: (((put~None)~in)~None)