我有一些类似下面的日志
endeavor.fujitsu.co.jp - - [10/Jul/1995:00:00:15 -0400] "GET /images/ HTTP/1.0" 200 17688
ad13-022.compuserve.com - - [10/Jul/1995:00:00:15 -0400] "GET /history/gemini/gemini-spacecraft.txt HTTP/1.0" 200 651
pm2-15.magicnet.net - - [10/Jul/1995:00:00:15 -0400] "GET /images/launch-logo.gif HTTP/1.0" 200 1713
204.239.199.40 - - [10/Jul/1995:00:00:16 -0400] "GET /shuttle/missions/sts-71/images/KSC-95EC-0613.gif HTTP/1.0" 200 45970
pm1-4.tricon.net - - [10/Jul/1995:00:00:17 -0400] "GET /images/WORLD-logosmall.gif HTTP/1.0" 200 669
scorpio.digex.net - - [10/Jul/1995:00:00:19 -0400] "GET /history/mercury/mr-3/mr-3.html HTTP/1.0" 200 1124
我需要从上述日志中提取路径。这是我尝试过的代码
val pattern = "\\s+([^\\s]+)\\s+HTTP".r
val match = pattern.findFirstIn(log)
这是我得到的输出。
/images/ HTTP
/history/gemini/gemini-spacecraft.txt HTTP
/images/launch-logo.gif HTTP
/shuttle/missions/sts-71/images/KSC-95EC-0613.gif HTTP
/images/WORLD-logosmall.gif HTTP
/history/mercury/mr-3/mr-3.html HTTP
如何清除上述路径中的HTTP?
答案 0 :(得分:1)
答案 1 :(得分:0)
您的比赛位于第一个捕获组communicate()
中,您可以将其缩短为:
()
在Scala中
\s(\S+)\s+HTTP
您可能会使用findAllIn获取日志:
val pattern = "\\s(\\S+)\\s+HTTP".r
结果
val pattern = "\\s(\\S+)\\s+HTTP".r
val strings = List(
"""endeavor.fujitsu.co.jp - - [10/Jul/1995:00:00:15 -0400] "GET /images/ HTTP/1.0" 200 17688 """,
"""ad13-022.compuserve.com - - [10/Jul/1995:00:00:15 -0400] "GET /history/gemini/gemini-spacecraft.txt HTTP/1.0" 200 651 """,
"""pm2-15.magicnet.net - - [10/Jul/1995:00:00:15 -0400] "GET /images/launch-logo.gif HTTP/1.0" 200 1713 """,
"""204.239.199.40 - - [10/Jul/1995:00:00:16 -0400] "GET /shuttle/missions/sts-71/images/KSC-95EC-0613.gif HTTP/1.0" 200 45970 """,
"""pm1-4.tricon.net - - [10/Jul/1995:00:00:17 -0400] "GET /images/WORLD-logosmall.gif HTTP/1.0" 200 669 """,
"""scorpio.digex.net - - [10/Jul/1995:00:00:19 -0400] "GET /history/mercury/mr-3/mr-3.html HTTP/1.0" 200 1124"""
)
strings.foreach { log =>
val m = pattern.findAllIn(log).group(1)
println(m)
}
还要匹配注释中的这一行:
columbia.acc.brad.ac.uk--[10 / Jul / 1995:00:52:36 -0400]“获取 /ksc.html“ 200 7067
您可以使用:
/images/
/history/gemini/gemini-spacecraft.txt
/images/launch-logo.gif
/shuttle/missions/sts-71/images/KSC-95EC-0613.gif
/images/WORLD-logosmall.gif
/history/mercury/mr-3/mr-3.html
答案 2 :(得分:0)
您可以提取捕获组(请参阅其他答案),也可以简化正则表达式模式以仅匹配您感兴趣的内容。
val pattern = "\\s+/[^\\s]+".r