使用扫描仪的解析器不适用于"#"标志

时间:2014-10-25 19:57:57

标签: java parsing

我试图按如下方式解析输入文件:

#*Nonmonotonic logic - context-dependent reasoning.
#@Victor W. Marek,Miroslaw Truszczynski
#t1993
#cArtificial Intelligence
#index3003478
#%3005567
#%3005568
#%3005569
#!abstracst

#*Wissensrepräsentation und Inferenz - eine grundlegende Einführung.
#@Wolfgang Bibel,Steffen Hölldobler,Torsten Schaub
#t1993
#cArtificial Intelligence
#index3005557
#%3005567
#!abstracts2

我正在为这个文件创建解析器,我正在寻找输出,如下所示:

Title: Nonmonotonic logic - context-dependent reasoning.
Author: Victor W. Marek,Miroslaw Truszczynski
Year: 1993
Domain: Artificial Intelligence
Index: 3003478
Citation: 3005567
Citation: 3005568
Citation: 3005569
Abstract: Abstract

Title: Wissensrepräsentation und Inferenz - eine grundlegende Einführung.
Author: Wolfgang Bibel,Steffen Hölldobler,Torsten Schaub
Year: 1993
Domain: Artificial Intelligence
Index: 3005557
Citation: 3005567
Abstract: Abstract2

我到目前为止创建的代码在下面,但它产生了一个完全不同的输出,我所期望的,我无法弄清楚为什么扫描器以错误的方式读取它。它似乎只读取每行的第一个字符作为标题,而不是每个部分的第一行。 我想也许扫描仪不会读取“#”符号,但我想我也可能错了。为了弄清楚什么是错的,例如,如果我只想打印标题,我得到的输出是

Title:*
Title:@
Title:t
Title:c
Title:i
Title:%
Title:!
Title: 
Title:*
Title:@
Title:t
Title:c
Title:i
Title:i
Title:%
Title:!
Title:
Done.

如果我试图打印标题和作者,我得到的输出如下:

Title:*
Author:Nonmonotonic logic - context-dependent reasoning.
Title:@
Author:Victor W. Marek,Miroslaw Truszczynski
Title:t
Author:1993
Title:c
Author:Artificial Intelligence
Title:i
Author:ndex3003478
Title:%
Author: 
Title:!
Title: 
Author: 
Title:*
Author:Wissensrepr?sentation und Inferenz - eine grundlegende Einf?hrung.
Title:@
Author:Wolfgang Bibel,Steffen H?lldobler,Torsten Schaub
Title:t
Author:1993
Title:c
Author:Artificial Intelligence
Title:i
Author:ndex3005557
Title:i
Author:ndex3005557
Title:%
Author: 
Title:!
Title: 
Author: 
Done.

代码如下:

import java.sql.*;
import java.util.Scanner;
import java.io.*;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Citation{

    public static void main (String[] args) throws SQLException,
    ClassNotFoundException, IOException{

        Citation parser = new Citation("D:/test.txt");
        parser.processLineByLine();
        log("Done.");

    }

     public Citation(String aFileName){
         fFilePath = Paths.get(aFileName);
     }

     public final void processLineByLine() throws IOException {
         try (Scanner scanner =  new Scanner(fFilePath, ENCODING.name())){
              while (scanner.hasNextLine()){
                  processLine(scanner.nextLine());
              }      
            }
     }

     protected void processLine(String aLine){


                Scanner scanner = new Scanner(aLine);
                scanner.useDelimiter("\n");

                while(scanner.hasNext()){

                //   Scanner scanner = new Scanner(aLine);
                     scanner.useDelimiter("#*");
                     if(scanner.hasNext()){
                         String title = scanner.next();
                         System.out.println("Title:" + title);

                     }

                //   Scanner scanner3 = new Scanner(aLine);
                     scanner.useDelimiter("#@");
                     if(scanner.hasNext()){
                         String author = scanner.next();
                //       System.out.println(author);
                     }

                //   Scanner scanner4 = new Scanner(aLine);
                     scanner.useDelimiter("#t");
                     if(scanner.hasNext()){
                         String year = scanner.next();
                //       System.out.println(year);
                     }
                //   Scanner scanner5 = new Scanner(aLine);
                     scanner.useDelimiter("#c");
                     if(scanner.hasNext()){
                         String domain = scanner.next();
                    //   System.out.println(domain);

                     }
                //   Scanner scanner6 = new Scanner(aLine);
                     scanner.useDelimiter("#index");
                     if(scanner.hasNext()){
                         String index = scanner.next();
                        // System.out.println(index);
                     }               
                //   Scanner scanner7 = new Scanner(aLine);
                     scanner.useDelimiter("#%");
                     if(scanner.hasNext()){
                         String cite = scanner.next();
                    //   System.out.println(cite);

                     }
                //   Scanner scanner8 = new Scanner(aLine);
                     scanner.useDelimiter("#!");
                     if(scanner.hasNext()){
                         String abstracts = scanner.next();
                        // System.out.println(abstracts);

                     }



                }





          }

          // PRIVATE 
          private final Path fFilePath;
          private final static Charset ENCODING = StandardCharsets.UTF_8;  

          private static void log(Object aObject){
            System.out.println(String.valueOf(aObject));
          }


        } 

当我将"#*"分隔符更改为"#//*"分隔符时,标题会被读取,但每行也会被读作标题。它没有检测到我的其他分隔符。我得到的输出如下:

Title:Nonmonotonic logic - context-dependent reasoning.
Title:#@Victor W. Marek,Miroslaw Truszczynski
Title:#t1993
Title:#cArtificial Intelligence
Title:#index3003478
Title:#% 
Title:#!
Title:  
Title:Wissensrepr?sentation und Inferenz - eine grundlegende Einf?hrung.
Title:#@Wolfgang Bibel,Steffen H?lldobler,Torsten Schaub
Title:#t1993
Title:#cArtificial Intelligence
Title:#index3005557
Title:#index3005557
Title:#% 
Title:#!
Title:  

2 个答案:

答案 0 :(得分:2)

假设文件格式不会很快改变,请修改如下

protected void processLine(String aLine) {
   if (aLine.trim().equals("")) {
       System.out.println();//executed when an empty line is read
   }
   else if (aLine.startsWith("#*")) {
      System.out.println("Title:" + aLine.substring(2)); //or, you can also do
      //System.out.println("Title:" + aLine.substring("#*".length()));
   } else if (aLine.startsWith("otherCases") {
      //proceed for other cases in similar fashion.
   }
   .
   .
   .
}

答案 1 :(得分:2)

问题是您使用的是scanner.useDelimiter("#*");。这种方法需要一个正则表达式,其中*符号表示零符号(在您的情况下为#)。因此,请在您的案例中使用scanner.useDelimiter("#\\*");