解析日志文件以使用正则表达式显示多行数据

时间:2017-07-06 15:58:58

标签: java regex logging

所以我试图在这里解析一些代码以从日志文件中获取消息文本。我会随便解释。这是代码:

// Print to interactions
  try
  {
  // assigns the input file to a filereader object
     BufferedReader infile = new BufferedReader(new FileReader(log));

      sc = new Scanner(log);
            while(sc.hasNext())
              {
                 String line=sc.nextLine();
                   if(line.contains("LANTALK")){
                    Document doc = Jsoup.parse(line);
                    Element idto = doc.select("MBXTO").first();
                    Element  msg = doc.select("MSGTEXT").first();
                    System.out.println(" to " + idto.text() + " " + 
                        msg.text());
                    System.out.println();

                   } // End of if

               } // End of while


  try
  {
   // Print to output file
      sc = new Scanner (log);
            while(sc.hasNext())
              {
                 String line=sc.nextLine();
                   if(line.contains("LANTALK")){
                    Document doc = Jsoup.parse(line);
                    Element idto = doc.select("MBXTO").first();
                    Element  msg = doc.select("MSGTEXT").first();
                    outFile.println(" to " + idto.text() + " " + 
                        msg.text());
                    outFile.println();     
                    outFile.println();
               } // End of if

               } // End of while
    } // end of try

我从日志文件中获取输入,这里是一个示例,以及我过滤掉的行:< p>

08:25:20.740 [D] [T:000FF0] [F:LANTALK2C] <CMD>LANMSG</CMD>
<MBXID>1124</MBXID><MBXTO>5760</MBXTO><SUBTEXT>LanTalk</SUBTEXT><MOBILEADDR>
</MOBILEADDR><LAP>0</LAP><SMS>0</SMS><MSGTEXT>and I talked to him and he 
gave me a credit card number</MSGTEXT>
08:25:20.751 [+] [T:000FF0] [S:1:1:1124:5607:5] LANMSG [15/2 | 0]
08:25:20.945 [+] [T:000FF4] [S:1:1:1124:5607:5] LANMSGTYPESTOPPED [0/2 | 0]
08:25:21.327 [+] [T:000FE8] [S:1:1:1124:5607:5] LANMSGTYPESTARTED [0/2 | 0]

到目前为止,我已经能够过滤包含消息的行(LANMSG)。从那以后,我就能够获得收件人的身份证号码(MBXTO)。但下一行包含发件人的ID,我需要将其拉出并显示。 ([S:1:1:1124:SENDERID:5])。我该怎么做?以下是我得到的输出的副本:

to 5760 and I talked to him and he gave me a credit card number

这就是我需要得到的东西:

SENDERID to 5760 and I talked to him and he gave me a credit card number

你们可以给我的任何帮助都会很棒。我只是不确定如何获取我需要的信息。

1 个答案:

答案 0 :(得分:0)

你的答案不够清楚,但是因为你似乎没有在这段代码中使用正则表达式...记得在询问之前指明你尝试了什么。 无论如何,你正在寻找的正则表达式是:

(\d{2}:\d{2}:\d{2}\.\d{3})\s\[D\].+<MBXID>(\d+)<\/MBXID><MBXTO>(\d+)<\/MBXTO>.+<MSGTEXT>(.+)<\/MSGTEXT>

Working example in Regex101
它应该捕获:
$ 1 08:25:20.740
$ 2 1124
$ 3 5760
$ 4 and I talked to him and he gave me a credit card number(请注意,它还会捕获\ n或换行符)。
(另外,您在Java中使用matcher.group(number)而不是$number。)

然后您可以使用这些替换(组参考)术语来获取格式化的输出。

例如:$1 [$2] to [$3] $4

应该返回:

08:25:20.740 [1124] to [5760] and I talked to him and he
gave me a credit card number

请记住,当您要在Java代码中实现正则表达式时,必须转义所有反斜杠(\),因此,此正则表达式看起来更大:

Pattern pattern = Pattern.compile("(\\d{2}:\\d{2}:\\d{2}\\.\\d{3})\\s\\[D\\].+<MBXID>(\\d+)<\\/MBXID><MBXTO>(\\d+)<\\/MBXTO>.+<MSGTEXT>(.+)<\\/MSGTEXT>", Pattern.MULTILINE + Pattern.DOTALL); 
// Multiline is used to capture the LANMSG more than once, and Dotall is used to make the '.' term in regex also match the newline in the input
Matcher matcher = pattern.matcher(input);
while (matcher.find()){
    String output = matcher.group(1) + " [" + matcher.group(2) + "] to [" + matcher.group(3) + "] " + matcher.group(4);
    System.out.println(output);
}

对于你的第二个问题哦,你已经编辑并删除了它。 。 。但我还是会回答: 您可以解析$2$3并使它们返回一个整数:

int id1 = Integer.parseInt(matcher.group(2));
int id2 = Integer.parseInt(matcher.group(3));

这样您就可以创建一个方法来返回这些ID的名称。例如:UserUtil.getName(int id)