Java顺序解析文件中的信息

时间:2015-08-03 12:31:02

标签: java file parsing

假设我的文件结构如下:

  

第0行:

     

354858 Some String That Is Important AA 其他STUFF SOMESTUFF   应该被忽视

     

第1行:

     

543788 Another String That Is Important AA 其他东西   应该被忽视的SOMESTUFF

依旧......

现在我想获取我的示例中标记的信息(参见灰色背景)。序列 AA 始终存在(并且可以用作中断并跳到下一行),而信息字符串的长度不同。

解析信息的最佳方法是什么?一个带有if, then, else的缓冲读卡器,或者你可以告诉某种解析器,读取一些lenth XYZ 然后将所有内容读入一个字符串,直到找到 AA 然后跳过

6 个答案:

答案 0 :(得分:1)

我会逐行读取文件,并将每一行与正则表达式相匹配。我希望我在下面的代码中的评论足够详细。

// The pattern to use
Pattern p = Pattern.compile("^([0-9]+)\\s+(([^A]|A[^A])+)AA");

// Read file line by line
BufferedReader br = new BufferedReader(new FileReader(myFile));
String line;
while((line = br.readLine()) != null) {
  // Match line against our pattern
  Matcher m = p.matcher(line);
  if(m.find()) {
    // Line is valid, process it however you want
    // m.group(1) contains the number
    // m.group(2) contains the text between number and AA
  } else {
    // Line has invalid format (pattern does not match)
  }
}

我使用的正则表达式(Pattern)的说明:

^([0-9]+)\s+(([^A]|A[^A])+)AA

^               matches the start of the line
([0-9]+)        matches any integral number
\s+             matches one or more whitespace characters
(([^A]|A[^A])+) matches any characters which are either not A or not followed by another A
AA              matches the terminating AA

更新为评论回复:

如果每一行都有前面的|字符,则表达式如下所示:

^\|([0-9]+)\s+(([^A]|A[^A])+)AA

在JAVA中,你需要像这样逃避它:

"^\\|([0-9]+)\\s+(([^A]|A[^A])+)AA"

角色|在正则表达式中具有特殊含义,必须进行转义。

答案 1 :(得分:1)

如果没有更多信息,就无法告诉您哪个问题最适合您。

一种解决方案可能是

String s = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
String[] split = s.substring(0, s.indexOf(" AA")).split(" ", 2);
System.out.println("split = " + Arrays.toString(split));

<强>输出

split = [354858, Some String That Is Important]

答案 2 :(得分:1)

您可以逐行阅读文件,并排除包含 AA charSequence的部分:

final String charSequence = "AA";
String line;
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream("yourfilename")));
try {
    while ((line = r.readLine()) != null) {
       int pos = line.indexOf(charSequence);
       if (pos > 0) {
            String myImportantStuff = line.substring(0, pos);
            //do something with your useful string
       }
    }
} finally {
    r.close();
}

答案 3 :(得分:0)

使用正则表达式:.+?(?=AA)

检查 Here is the Demo

答案 4 :(得分:0)

以下是您的解决方案:

public static void main(String[] args) {
    InputStream source; //select a text source (should be a FileInputStream)
    {
        String fileContent = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED\n" +
                "543788 Another String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
        source = new ByteArrayInputStream(fileContent.getBytes(StandardCharsets.UTF_8));
    }

    try(BufferedReader stream = new BufferedReader(new InputStreamReader(source))) {
        Pattern pattern = Pattern.compile("^([0-9]+) (.*?) AA .*$");
        while(true) {
            String line = stream.readLine();
            if(line == null) {
                break;
            }
            Matcher matcher = pattern.matcher(line);
            if(matcher.matches()) {
                String someNumber = matcher.group(1);
                String someText = matcher.group(2);
                //do something with someNumber and someText
            } else {
                throw new ParseException(line, 0);
            }
        }
    } catch (IOException | ParseException e) {
        e.printStackTrace(); // TODO ...
    }
}

答案 5 :(得分:0)

您可以使用正则表达式,但如果您知道每一行都包含AA并且您希望内容最多为AA,那么您可以简单地执行substring(int,int)来获取排队到AA

public List read(Path path) throws IOException {
    return Files.lines(path)
          .map(this::parseLine)
          .collect(Collectors.toList());
}

public String parseLine(String line){
    int index = line.indexOf("AA");
    return line.substring(0,index);
}

这是{Java}版read

public List read(Path path) throws IOException {
    List<String> content = new ArrayList<>();

    try(BufferedReader reader = new BufferedReader(new FileReader(path.toFile()))){
        String line;
        while((line = reader.readLine()) != null){
            content.add(parseLine(line));
        }
    }

    return content;
}