我试图按如下方式解析输入文件:
#*Nonmonotonic logic - context-dependent reasoning.
#@Victor W. Marek,Miroslaw Truszczynski
#t1993
#cArtificial Intelligence
#index3003478
#%3005567
#%3005568
#%3005569
#!abstracst
#*Wissensrepräsentation und Inferenz - eine grundlegende Einführung.
#@Wolfgang Bibel,Steffen Hölldobler,Torsten Schaub
#t1993
#cArtificial Intelligence
#index3005557
#%3005567
#!abstracts2
我正在为这个文件创建解析器,我正在寻找输出,如下所示:
Title: Nonmonotonic logic - context-dependent reasoning.
Author: Victor W. Marek,Miroslaw Truszczynski
Year: 1993
Domain: Artificial Intelligence
Index: 3003478
Citation: 3005567
Citation: 3005568
Citation: 3005569
Abstract: Abstract
Title: Wissensrepräsentation und Inferenz - eine grundlegende Einführung.
Author: Wolfgang Bibel,Steffen Hölldobler,Torsten Schaub
Year: 1993
Domain: Artificial Intelligence
Index: 3005557
Citation: 3005567
Abstract: Abstract2
我到目前为止创建的代码在下面,但它产生了一个完全不同的输出,我所期望的,我无法弄清楚为什么扫描器以错误的方式读取它。它似乎只读取每行的第一个字符作为标题,而不是每个部分的第一行。 我想也许扫描仪不会读取“#”符号,但我想我也可能错了。为了弄清楚什么是错的,例如,如果我只想打印标题,我得到的输出是
Title:*
Title:@
Title:t
Title:c
Title:i
Title:%
Title:!
Title:
Title:*
Title:@
Title:t
Title:c
Title:i
Title:i
Title:%
Title:!
Title:
Done.
如果我试图打印标题和作者,我得到的输出如下:
Title:*
Author:Nonmonotonic logic - context-dependent reasoning.
Title:@
Author:Victor W. Marek,Miroslaw Truszczynski
Title:t
Author:1993
Title:c
Author:Artificial Intelligence
Title:i
Author:ndex3003478
Title:%
Author:
Title:!
Title:
Author:
Title:*
Author:Wissensrepr?sentation und Inferenz - eine grundlegende Einf?hrung.
Title:@
Author:Wolfgang Bibel,Steffen H?lldobler,Torsten Schaub
Title:t
Author:1993
Title:c
Author:Artificial Intelligence
Title:i
Author:ndex3005557
Title:i
Author:ndex3005557
Title:%
Author:
Title:!
Title:
Author:
Done.
代码如下:
import java.sql.*;
import java.util.Scanner;
import java.io.*;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.nio.file.Paths;
public class Citation{
public static void main (String[] args) throws SQLException,
ClassNotFoundException, IOException{
Citation parser = new Citation("D:/test.txt");
parser.processLineByLine();
log("Done.");
}
public Citation(String aFileName){
fFilePath = Paths.get(aFileName);
}
public final void processLineByLine() throws IOException {
try (Scanner scanner = new Scanner(fFilePath, ENCODING.name())){
while (scanner.hasNextLine()){
processLine(scanner.nextLine());
}
}
}
protected void processLine(String aLine){
Scanner scanner = new Scanner(aLine);
scanner.useDelimiter("\n");
while(scanner.hasNext()){
// Scanner scanner = new Scanner(aLine);
scanner.useDelimiter("#*");
if(scanner.hasNext()){
String title = scanner.next();
System.out.println("Title:" + title);
}
// Scanner scanner3 = new Scanner(aLine);
scanner.useDelimiter("#@");
if(scanner.hasNext()){
String author = scanner.next();
// System.out.println(author);
}
// Scanner scanner4 = new Scanner(aLine);
scanner.useDelimiter("#t");
if(scanner.hasNext()){
String year = scanner.next();
// System.out.println(year);
}
// Scanner scanner5 = new Scanner(aLine);
scanner.useDelimiter("#c");
if(scanner.hasNext()){
String domain = scanner.next();
// System.out.println(domain);
}
// Scanner scanner6 = new Scanner(aLine);
scanner.useDelimiter("#index");
if(scanner.hasNext()){
String index = scanner.next();
// System.out.println(index);
}
// Scanner scanner7 = new Scanner(aLine);
scanner.useDelimiter("#%");
if(scanner.hasNext()){
String cite = scanner.next();
// System.out.println(cite);
}
// Scanner scanner8 = new Scanner(aLine);
scanner.useDelimiter("#!");
if(scanner.hasNext()){
String abstracts = scanner.next();
// System.out.println(abstracts);
}
}
}
// PRIVATE
private final Path fFilePath;
private final static Charset ENCODING = StandardCharsets.UTF_8;
private static void log(Object aObject){
System.out.println(String.valueOf(aObject));
}
}
当我将"#*"
分隔符更改为"#//*"
分隔符时,标题会被读取,但每行也会被读作标题。它没有检测到我的其他分隔符。我得到的输出如下:
Title:Nonmonotonic logic - context-dependent reasoning.
Title:#@Victor W. Marek,Miroslaw Truszczynski
Title:#t1993
Title:#cArtificial Intelligence
Title:#index3003478
Title:#%
Title:#!
Title:
Title:Wissensrepr?sentation und Inferenz - eine grundlegende Einf?hrung.
Title:#@Wolfgang Bibel,Steffen H?lldobler,Torsten Schaub
Title:#t1993
Title:#cArtificial Intelligence
Title:#index3005557
Title:#index3005557
Title:#%
Title:#!
Title:
答案 0 :(得分:2)
假设文件格式不会很快改变,请修改如下
protected void processLine(String aLine) {
if (aLine.trim().equals("")) {
System.out.println();//executed when an empty line is read
}
else if (aLine.startsWith("#*")) {
System.out.println("Title:" + aLine.substring(2)); //or, you can also do
//System.out.println("Title:" + aLine.substring("#*".length()));
} else if (aLine.startsWith("otherCases") {
//proceed for other cases in similar fashion.
}
.
.
.
}
答案 1 :(得分:2)
问题是您使用的是scanner.useDelimiter("#*");
。这种方法需要一个正则表达式,其中*符号表示零符号(在您的情况下为#)。因此,请在您的案例中使用scanner.useDelimiter("#\\*");
。