如何解析输入文件

时间:2012-11-02 15:33:47

标签: java parsing java.util.scanner

所以我需要解析这个输入文件,我似乎无法弄清楚如何去做。我尝试过使用scanner.Delimiter(),但仍然遇到问题。任何人如何理解如何正确地做到这一点?

以下是输入文件中的一行:

  

200.88.223.98 - - [01 / Feb / 2007:04:02:22 -0500]“GET / gallery / v / events / album02 / contests / programmingContest05 /?g2_GALLERYSID = 3be9666f9c07e16b7f33e2ea8acb8dd2& g2_fromNavId = x332be852 HTTP / 1.1” 200 52464“http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02 %2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2& g2_GALLERYSID = 3be9666f9c07e16b7f33e2ea8acb8dd2& g2_returnName = album“”Opera / 6.01(Windows 98; U)[en]“

假设分成这样的部分:

  1. address = 200.88.223.98

  2. date = 01/Feb/2007:04:02:22 -0500

  3. request = GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1

  4. status = 200

  5. bytes = 52464

  6. refer = http://cs.tcnj.edu/gallery/main.php? g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album

  7. agent = Opera/6.01 (Windows 98; U) [en]

  8. 以下是我的代码试图解析它的部分:

    Scanner scan = new Scanner(input);
    scan.useDelimiter("[-']+");
    while (scan.hasNextLine()) 
    {
        String address = scan.next();
        String date = scan.next();
        String request = scan.next();
        int status = scan.nextInt();
        int bytes = scan.nextInt();
        String refer = scan.next();
        String agent = scan.next(); 
    }
    

    显示以下错误:

    Exception in thread "main" java.util.InputMismatchException      
      at java.util.Scanner.throwFor(Scanner.java:840) 
      at java.util.Scanner.next(Scanner.java:1461) 
      at java.util.Scanner.nextInt(Scanner.java:2091) 
      at java.util.Scanner.nextInt(Scanner.java:2050) 
      at Analyzer.start(Unknown Source) 
      at Driver.main(Unknown Source) 
    Java Result: 1
    

1 个答案:

答案 0 :(得分:0)

想一想。 按空格拆分行并提取数据

String s = "200.88.223.98 - - [01/Feb/2007:04:02:22 -0500] \"GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1\" 200 52464 \"http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album\" \"Opera/6.01 (Windows 98; U) [en]\"";

  String arr [] = s.split(" ");

  for(int i =0 ;i<arr.length;i++){
      System.out.println(i+" - "+arr[i]);
  }

Out out是:

0 : 200.88.223.98
1 : -
2 : -
3 : [01/Feb/2007:04:02:22
4 : -0500]
5 : "GET
6 : /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852
7 : HTTP/1.1"
8 : 200
9 : 52464
10 : "http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album"
  11 : "Opera/6.01
  12 : (Windows
  13 : 98;
  14 : U)
  15 : [en]"

所以第0个元素给你的ip,第3和第4个给你的日期,6和7nt给你的请求,所以你可以提取你的数据。