我有一个日志文件,我想忽略包含/ owncloud的行,只找到包含.html的行,并打印到仅屏蔽url部分。 原始文件如下所示:
1.1.1.1 - abcdefg [01/Dec/2013:03:18:19 +0900] "PROPFIND /owncloud/remote.php/webdav// HTTP/1.1" 111 111 "-" "Mozilla/5.0 (Macintosh) mirall/1.4.2"
2.2.2.2 - - [02/Dec/2013:17:28:29 +0900] "GET /img/bg_introduction.png HTTP/1.1" 111 1111 "-" "msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)"
3.3.3.3 - - [02/Dec/2013:15:46:25 +0900] "GET / HTTP/1.0" 111 1111 "-" "-"
4.4.4.4 - - [02/Dec/2013:08:54:13 +0900] "GET /xxxx/index.html HTTP/1.1" 111 1111 "http://xxxx.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64)
5.5.5.5 - - [02/Dec/2013:08:54:17 +0900] "GET /contact.html HTTP/1.1" 111 1111 "http://xxxx.com/yyyyy/zzzz.html" "Mozilla/5.0 (Windows NT
,输出应为:
/xxxx/index.html
/contact.html
我正在学习正则表达式,但我无法理解当它夹在其他东西中时如何找到一个单词。此外,我仍然不明白如何削减比赛。 这就是我现在正在做的事情。
public class ParseLog {
static BufferedReader input;
static final Pattern pattern = Pattern.compile("(/owncloud)");
static final Pattern pattern2 = Pattern.compile("(.html)");
static Matcher matcher;
static Matcher matcher2;
public static void main(String[] args) throws IOException {
input = new BufferedReader(new FileReader("/path to file /access_log.txt"));
String c = "";
while ((c=input.readLine())!=null){
// Checks to the GET part
String[] splitString = (c.split("\""));
if (splitString.length >= 2){
// if there is only 1 substring there is no "GET
matcher = pattern.matcher(splitString[1]);
matcher2 = pattern2.matcher(splitString[1]);
if(!matcher.find() && matcher2.find()){
String parsedString = splitString[1].replaceAll("GET ", "");
System.out.println(parsedString.replaceAll(" HTTP/1.1", ""));
}
}
}
}
}
我的问题是: 是否可以通过5个步骤执行我的程序,在一个正则表达式中?
答案 0 :(得分:3)
似乎你过分复杂了。也许尝试在输入中找到以GET
开头的部分,然后找到/[no-space-characters]+.html
。您可以将此/...html
部分包含在括号中,以将它们放在group中。
以下是代码示例,它提供了您在问题中提到的相同结果
//one liner to read data from file (don't bother with it now)
String input = new Scanner(new File("input.txt")).useDelimiter("\\A").next();
//we want to find `Get /[no-whitespace-characters]+.html`
Pattern p = Pattern.compile("GET (/\\S+\\.html)");
Matcher m = p.matcher(input);
while(m.find())
System.out.println(m.group(1));
输出:
/xxxx/index.html
/contact.html