如果在函数中定义了正则表达式,则找不到匹配项

时间:2015-06-30 09:13:59

标签: regex scala

斯卡拉新手在这里!我正在尝试定义一个函数,该函数将字符串作为输入并返回该字符串的一部分。当我使用正则表达式手动执行此操作时,它可以正常工作,但是当我在函数中定义它时,它似乎找不到匹配项。有人可以向我解释一下吗?

这是我的字符串:

val str = """1.1.1.1 - - [30/Apr/2015:13:23:20 +0200] "GET /S1/HLS_LIVE/slowturk/32/prog_index21964.ts?key=36ec178eee7ae44f1b204aec4627a120&app=com.radyolar.slowturk.iphone HTTP/1.1" 200 0 "-" "AppleCoreMedia/1.0.0.12F70 (iPhone; U; CPU OS 8_3 like Mac OS X; de_de)" "-" 0.005 ut="0.005" cs="MISS""""

这里定义功能:

def foo(record: String): String = {
    val p_ip = "(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})"
    val p_client = "(\\S+)"
    val p_user = "(\\S+)"
    val p_dateTime = "(\\[.+?\\])"
    val p_request = "\"(.+?)\""
    val p_status = "(\\d{3})"
    val p_bytes = "(\\S+)"
    val p_referer = "(\\S+)"
    val p_agent = "\\\"([^\"]+)\\\""
    val p_forward = "(\\S+)"
    val p_req_time = "(\\d\\.\\d\\d\\d)"
    val p_ut = "ut=\"([^\"]+)\""
    val p_cs = "cs=\"([^\"]+)\""
    val regex = s"$p_ip $p_client $p_user $p_dateTime $p_request $p_status $p_bytes $p_referer $p_agent $p_forward $p_req_time $p_ut $p_cs".r

    val grouped = regex.findAllIn(record)
    val ip = grouped.group(1)
    return ip 
  } 

这是我得到的结果:

    scala> foo(str)
java.lang.IllegalStateException: No match available
        at java.util.regex.Matcher.start(Matcher.java:372)
        at scala.util.matching.Regex$MatchIterator.start(Regex.scala:591)
        at scala.util.matching.Regex$MatchData$class.group(Regex.scala:454)
        at scala.util.matching.Regex$MatchIterator.group(Regex.scala:566)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.foo(<console>:74)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:60)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:71)
        at $iwC$$iwC$$iwC.<init>(<console>:73)
        at $iwC$$iwC.<init>(<console>:75)
        at $iwC.<init>(<console>:77)
        at <init>(<console>:79)
        at .<init>(<console>:83)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

这里我的正则表达式明确写给你们任何人想要检查它:

(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (\S+) (\S+) (\[.+?\]) "(.+?)" (\d{3}) (\S+) (\S+) \"([^"]+)\" (\S+) (\d\.\d\d\d) ut="([^"]+)" cs="([^"]+)"

如果没有在功能中定义,它可以工作:

  val p_ip = "(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})"
  val p_client = "(\\S+)"
  val p_user = "(\\S+)"
  val p_dateTime = "(\\[.+?\\])"
  val p_request = "\"(.+?)\""
  val p_status = "(\\d{3})"
  val p_bytes = "(\\S+)"
  val p_referer = "(\\S+)"
  val p_agent = "\\\"([^\"]+)\\\""
  val p_forward = "(\\S+)"
  val p_req_time = "(\\d\\.\\d\\d\\d)"
  val p_ut = "ut=\"([^\"]+)\""
  val p_cs = "cs=\"([^\"]+)\""
  val regex = s"$p_ip $p_client $p_user $p_dateTime $p_request $p_status $p_bytes $p_referer $p_agent $p_forward $p_req_time $p_ut $p_cs".r

  val grouped = regex.findAllIn(str)
  val ip = grouped.group(1) // ip is "1.1.1.1"

1 个答案:

答案 0 :(得分:5)

方法findAllIn返回MatchIterator的实例。根据其文件:

  

scala.util.matching.Regex.MatchData继承的所有方法都将抛出java.lang.IllegalStateException,直到初始化匹配器。可以通过调用hasNextnext()或调用这些方法来初始化匹配器,例如通过调用toString或迭代迭代器的元素。

当您在控制台中运行代码时,会调用方法toString以将结果输出到控制台,并初始化MatchIterator,因此在该方法group开始工作之后。

要在函数内部实现此行为,您可以执行以下操作:

def foo(record: String): String = {
    // omitted ...

    val grouped = regex.findAllIn(record)
    grouped.hasNext // Initializing MatchIterator
    val ip = grouped.group(1)
    ip 
}