斯卡拉新手在这里!我正在尝试定义一个函数,该函数将字符串作为输入并返回该字符串的一部分。当我使用正则表达式手动执行此操作时,它可以正常工作,但是当我在函数中定义它时,它似乎找不到匹配项。有人可以向我解释一下吗?
这是我的字符串:
val str = """1.1.1.1 - - [30/Apr/2015:13:23:20 +0200] "GET /S1/HLS_LIVE/slowturk/32/prog_index21964.ts?key=36ec178eee7ae44f1b204aec4627a120&app=com.radyolar.slowturk.iphone HTTP/1.1" 200 0 "-" "AppleCoreMedia/1.0.0.12F70 (iPhone; U; CPU OS 8_3 like Mac OS X; de_de)" "-" 0.005 ut="0.005" cs="MISS""""
这里定义功能:
def foo(record: String): String = {
val p_ip = "(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})"
val p_client = "(\\S+)"
val p_user = "(\\S+)"
val p_dateTime = "(\\[.+?\\])"
val p_request = "\"(.+?)\""
val p_status = "(\\d{3})"
val p_bytes = "(\\S+)"
val p_referer = "(\\S+)"
val p_agent = "\\\"([^\"]+)\\\""
val p_forward = "(\\S+)"
val p_req_time = "(\\d\\.\\d\\d\\d)"
val p_ut = "ut=\"([^\"]+)\""
val p_cs = "cs=\"([^\"]+)\""
val regex = s"$p_ip $p_client $p_user $p_dateTime $p_request $p_status $p_bytes $p_referer $p_agent $p_forward $p_req_time $p_ut $p_cs".r
val grouped = regex.findAllIn(record)
val ip = grouped.group(1)
return ip
}
这是我得到的结果:
scala> foo(str)
java.lang.IllegalStateException: No match available
at java.util.regex.Matcher.start(Matcher.java:372)
at scala.util.matching.Regex$MatchIterator.start(Regex.scala:591)
at scala.util.matching.Regex$MatchData$class.group(Regex.scala:454)
at scala.util.matching.Regex$MatchIterator.group(Regex.scala:566)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.foo(<console>:74)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:60)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:71)
at $iwC$$iwC$$iwC.<init>(<console>:73)
at $iwC$$iwC.<init>(<console>:75)
at $iwC.<init>(<console>:77)
at <init>(<console>:79)
at .<init>(<console>:83)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
这里我的正则表达式明确写给你们任何人想要检查它:
(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) (\S+) (\S+) (\[.+?\]) "(.+?)" (\d{3}) (\S+) (\S+) \"([^"]+)\" (\S+) (\d\.\d\d\d) ut="([^"]+)" cs="([^"]+)"
如果没有在功能中定义,它可以工作:
val p_ip = "(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})"
val p_client = "(\\S+)"
val p_user = "(\\S+)"
val p_dateTime = "(\\[.+?\\])"
val p_request = "\"(.+?)\""
val p_status = "(\\d{3})"
val p_bytes = "(\\S+)"
val p_referer = "(\\S+)"
val p_agent = "\\\"([^\"]+)\\\""
val p_forward = "(\\S+)"
val p_req_time = "(\\d\\.\\d\\d\\d)"
val p_ut = "ut=\"([^\"]+)\""
val p_cs = "cs=\"([^\"]+)\""
val regex = s"$p_ip $p_client $p_user $p_dateTime $p_request $p_status $p_bytes $p_referer $p_agent $p_forward $p_req_time $p_ut $p_cs".r
val grouped = regex.findAllIn(str)
val ip = grouped.group(1) // ip is "1.1.1.1"
答案 0 :(得分:5)
方法findAllIn
返回MatchIterator
的实例。根据其文件:
从
scala.util.matching.Regex.MatchData
继承的所有方法都将抛出java.lang.IllegalStateException
,直到初始化匹配器。可以通过调用hasNext
或next()
或调用这些方法来初始化匹配器,例如通过调用toString
或迭代迭代器的元素。
当您在控制台中运行代码时,会调用方法toString
以将结果输出到控制台,并初始化MatchIterator
,因此在该方法group
开始工作之后。
要在函数内部实现此行为,您可以执行以下操作:
def foo(record: String): String = {
// omitted ...
val grouped = regex.findAllIn(record)
grouped.hasNext // Initializing MatchIterator
val ip = grouped.group(1)
ip
}