解析文件以匹配特定模式并返回具有不同模式的后续行

时间:2016-05-20 12:27:28

标签: java java.util.scanner

我有一个日志文件,我正在尝试按以下方式解析文件:

要解析的文件如下所示:

filename.......f1
This test is associated with file 1 - ignore it
filename.......f2
This test is associated with file 2 -ignore it
filename.......f3
This test is associated with file 3 - line 1 - do not ignore it
This test is associated with file 3 - line 2 - do not ignore it
filename.......f4
This test is associated with file 4 - ignore it
filename.......f5
This test is associated with file 5 - do not ignore it

假设我们正在使用Regx模式对文件中的文本进行如下操作:

MATCHING_PATTERN1 - for "filename.......f[X]"
MATCHING_PATTERN2 - for "This test is associated with file [X] - do not ignore it"

我正在使用以下代码:

package org.c2pfiscbk.tutorial;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class TestLogParser {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub
        LogParser lp = new LogParser();
        lp.logReader();

    }

}

class LogParser {

    public void logReader(){
        File input = new File("file_location/fileName.log");
        try {
            Scanner scanner = new Scanner(input);

            while(scanner.hasNext()){
                String dLine = scanner.nextLine();
                if (dLine.matches("MATCHING_PATTERN1")){
                    System.out.println(dLine);
                }
                else{
                    if (dLine.matches("MATCHING_PATTERN2")){
                            System.out.println(dLine);
                    }
                }
            }

        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }
}

使用上述代码的输出是:

filename.......f1
filename.......f2
filename.......f3
This test is associated with file 3 - line 1 - do not ignore it
This test is associated with file 3 - line 2 - do not ignore it
filename.......f4
filename.......f5
This test is associated with file 5 - do not ignore it

然而,我的要求是:

filename.......f3
This test is associated with file 3 - line 1 - do not ignore it
This test is associated with file 3 - line 2 - do not ignore it
filename.......f5
This test is associated with file 5 - do not ignore it

这意味着我只对文件名(使用MATCHING_PATTERN1)以及某些文本(使用MATCHING_PATTERN2)以及文本(使用MATCHING_PATTERN2)本身感兴趣。

我不想使用sed或egrep或任何其他外部工具。

5 个答案:

答案 0 :(得分:1)

您需要创建一个布尔变量来说明是否需要打印第一个匹配项(因为您只想为所有关联的模式2打印一次)。然后,如上面的答案所示,您可以使用缓存样式变量来打印文件名一次。

String fileName=null;
boolean printFilename = false;    
while(scanner.hasNext()){
   String dLine = scanner.nextLine();
   if (dLine.matches("MATCHING_PATTERN1")){
       fileName = dLine;
       printFilename = true;
   }
   else{
       if (dLine.matches("MATCHING_PATTERN2")){
            if (printFilename) {        
                System.out.println(fileName);
                printFilename = false;
            }
            System.out.println(dLine);
        }
    }
}

答案 1 :(得分:0)

只需将文件名存储在某个变量中,只有在

中时才打印它
String fileName=null;
while(scanner.hasNext()){
   String dLine = scanner.nextLine();
   if (dLine.matches("MATCHING_PATTERN1")){
       fileName = dname;
   }
   else{
    if (dLine.matches("MATCHING_PATTERN2")){
        System.out.println(fileName );
        System.out.println(dLine);
     }
      }
   }

答案 2 :(得分:0)

你的输出是合乎逻辑的,因为第一个匹配使它打印任何"文件名.... f [X]",包括你不想要的。将第一个匹配项存储在变量中,或者将其打印出来,在第二个匹配项中打印该变量(如果它尚未打印)并且它可以按照您的意愿运行:

        String cacheLine = "";
        String lastPrintedCacheLine = "";
        while(scanner.hasNext()){
            String dLine = scanner.nextLine();
            if (dLine.matches("MATCHING_PATTERN1")){
                cacheLine = dLine;
            } else if (dLine.matches("MATCHING_PATTERN2")){
                if (! cacheLine.equals(lastPrintedCacheLine)) {
                    System.out.println(cacheLine);
                    lastPrintedCacheLine = cacheLine;
                }
                System.out.println(dLine);
            }
        }

验证。但是,Riggy的答案也适用,成本也较低。

请注意,else之后对{}块的使用已过时,您只需使用else if即可。使代码变得不那么混乱imho。

答案 3 :(得分:0)

跟踪要打印的所有行,并最终打印出来:

String currentHeader = scanner.nextLine();
List<String> followingLines = new ArrayList<>();
while(scanner.hasNext()){
    String line = scanner.nextLine();
    if (line.matches("MATCHING_PATTERN1")){
        // new header, let's print the lines if there are lines to print
        if(!followingLines.isEmpty()) {
            System.out.println(currentHeader);
            for(String followingLine : followingLines) {
                System.out.println(followingLine);
            }
        }
        // reset
        currentHeader = line;
        followingLines.clear();
    } else if (line.matches("MATCHING_PATTERN2")){
        followingLines.add(line);
    }
 }
 // print last one
 if(!followingLines.isEmpty()) {
     System.out.println(currentHeader);
     for(String followingLine : followingLines) {
         System.out.println(followingLine);
     }
 }

答案 4 :(得分:0)

您将运行第二个循环以获得结果

while(scanner.hasNext()){
    String dLine = scanner.nextLine();

    if (dLine.matches("MATCHING_PATTERN1")){     
         System.out.println(dLine);
         String dLine2 = scanner.nextLine();

         while(scanner.hasNext() && dLine2.matches("MATCHING_PATTERN2"){                  
              System.out.println(dLine2);
         }
    }
}