用于特殊字符的Java StringTokenizer

时间:2014-02-20 22:19:50

标签: java stringtokenizer

我想要在“”,{},[]之类的特殊字符之间不要进行标记,我该怎么办?

String: "192.168.2.20 - - [28/Jul/2006:10:27:10 -0300] 'GET /cgi-bin/try/ HTTP/1.0' 200 3395"

我想要这个输出:

192.168.2.20 
28/Jul/2006:10:27:10 -0300
GET /cgi-bin/try/ HTTP/1.0
200 3395

我的代码:

String rawData= "192.168.2.20 - - [28/Jul/2006:10:27:10 -0300] 'GET /cgi-bin/try/ HTTP/1.0' 200 3395";
int i=0;
String[] s1=new String[100];
String delim = " ";
StringTokenizer tok = new StringTokenizer(rawData, delim, true);

boolean expectDelim = false;
while (tok.hasMoreTokens()) {
    String token = tok.nextToken();
    if (delim.equals(token)) {
        if (expectDelim) {
            expectDelim = false;
            continue;
        } else {
            token = null;
        }
    }
    s1[i]=token;
    System.out.println(s1[i]);
    i+=1;
    expectDelim = true;
    }
}

输出:

192.168.2.20
-
-
[28/Jul/2006:10:27:10
-0300]
'GET
/cgi-bin/try/
HTTP/1.0'
200
3395

我可以为此日志执行此操作。但我想将我的代码用于所有apache日志。我怎么能这样做?

2 个答案:

答案 0 :(得分:0)

你可以使用这样的正则表达式:

public class Main {
    public static void main(String[] args) {
        Pattern p = Pattern.compile("(\\d+\\.\\d+\\.\\d+\\.\\d+)\\s.*\\s.*\\s\\[(.*)\\]\\s\\'(.*)\\'\\s(.*)");
        Matcher m = p.matcher("192.168.2.20 - - [28/Jul/2006:10:27:10 -0300] 'GET /cgi-bin/try/ HTTP/1.0' 200 3395");
        boolean b = m.matches();

        System.out.println(m.group(1));
        System.out.println(m.group(2));
        System.out.println(m.group(3));
        System.out.println(m.group(4));
    }
}

答案 1 :(得分:0)

查看以下代码。在" delim"中标记时,包括您不想要的特殊字符。以下代码段的字符串。

String s = scan.nextLine();
String delim = "!,?._'@ ";
StringTokenizer st  = new StringTokenizer(s, delim);
System.out.println(st.countTokens());
while(st.hasMoreTokens()){
    System.out.println(st.nextToken());
}