如何在模式重叠时找到事件

时间:2015-11-20 09:15:33

标签: java regex

上下文 这是一个日志分析的事情。我正在创建一个regex程序,以查找从客户端发送到服务器的某些请求的发生。我有包含这些请求的客户端日志文件以及其他日志。

问题: 当请求消息发送到服务器时,客户端应该有2个日志语句,如:

sending..
message_type

当上述语句或模式发现时我们可以说已经发送了一个请求。它是组合模式。确定

我们期待日志文件内容就像

sending..
message_type
...//other text
sending..
message_type
...//other text
sending..
message_type

从上面的日志我们可以说客户端发送了3条消息。但是在实际的日志文件中,模式如下所示重叠(不是针对所有消息,而是针对某些消息):

sending..(1)
...//other text
sending..(2)
message_type(2)
...//other text
message_type(1)
sending..(3)
message_type(3)

还有3个请求(我编号的消息要理解)。但是模式是重叠的。在完全记录第一条消息之前,记录了第二条消息。 以上说明是为了理解。以下是原始日志的一部分:

原始日志

Send message to server:
Created post notification log dir
Created post notification log dir
Created post notification log dir
Send message to server:
Created post notification log dir
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></params></response></message>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></response></message>

根据解释,单个请求将以其两部分标识:

Send message to server:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>

我尝试了什么

public class LogMatcher {   

    static final String create_session= "Send message to server(.){10,1000}(<\\?xml(.){10,500}type=\"createsession\"(.){1,100}</message>)";



    public static void main(String[] args) throws IOException {
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(new File("D:/dummy.txt"))));//I put the above log in this file
        StringBuilder b = new StringBuilder();
        String line = "";
        while((line = reader.readLine()) != null ){     
            b.append(line);
        }

        findMatch(b,"Send message to server","Send message to server");
        findMatch(b,create_session,"create_session");

    }
    private static int findMatch(StringBuilder b,String pattern, String type) {
        int count =0;
        Pattern regex = Pattern.compile(pattern,Pattern.MULTILINE);
        Matcher regexMatcher = regex.matcher(b.toString());
        while (regexMatcher.find()) {
            count++;
        } 
        System.out.printf("%25s%2d\n",type+": ",count);
        return count;
    }
}

当前输出

意图是找出发送的createsession条消息的数量

Send message to server:  2
        create_session:  1

预期输出

从日志中可以看出发送了2条消息。因此输出将是:

 Send message to server:  2
         create_session:  2

您可以在我的代码中看到我尝试过的模式。任何人都可以建议一种模式来获得理想的结果吗?

注意:可以简单地说为什么不单独使用计数Send message to server。因为在日志中有许多类型的消息,如login, closesession等。所有这些消息的第一部分都是Send message to server。此外,他们还为其他目的单独记录了消息类型,因此我们无法继续任何部分(仅指我们可以继续传播的组合)

1 个答案:

答案 0 :(得分:1)

  

查找从客户端发送到服务器的某些请求的发生。

     
    

你可以在这里忽略的“其他方式”,就像Store in DB :而不是Send message to server和xml消息一样。

  

我提出了一个新策略:

  1. 仅使用1个正则表达式匹配所有备选方案,仅解析一次日志(提高长文件的性能)。
  2. 独立匹配 type=\"createsession\" xmls。
  3. 还匹配 Store in DB: xmls,但忽略它们(不要递增计数器)。
  4. 我们可以使用以下表达式来匹配发送到服务器的消息数。

    ^(?<toserver>Send message to server:)
    
    • 请注意我使用的是named group,我们稍后可以引用regexMatcher.group("toserver")来增加计数器。

    并将目标xmls独立匹配为:

    ^(?<message><\? *xml\b.{10,500} type *= *\"createsession\")
    
    • 后来被称为regexMatcher.group("message")
    • 我们将使用一个独立的柜台。

    那么,我们如何忽略 Store in DB: xmls?我们可以匹配它们,而不是创建捕获。

    ^Store in DB ?:\r?\n(?:.*\n)*?<\? *xml\b.*
    
    • 它与文字Store in DB :匹配,后跟
    • \r?\n(?:.*\n)*?尽可能少的行,直到
    • <\? *xml\b.*它与第<?xml行匹配

    <强>正则表达式

    ^(?:Store in DB ?:\r?\n(?:.*\n)*?<\? *xml\b.*|(?<toserver>Send message to server:)|(?<message><\? *xml\b.{10,500} type *= *\"createsession\"))
    

    regex101 demo

    <强>代码

    static final String create_session = "^(?:Store in DB ?:\\r?\\n(?:.*\\n)*?<\\? *xml\\b.*|(?<toserver>Send message to server:)|(?<message><\\? *xml\\b.{10,500} type *= *\\\"createsession\\\"))";
    
    public static void main (String[] args) throws java.lang.Exception
    {
        //for testing purposes
        final String text = "Send message to server:\nCreated post notification log dir\nCreated post notification log dir\nCreated post notification log dir\nSend message to server:\nCreated post notification log dir\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nStore in DB :\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nINFO [a] - Server Response: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><response xaction_guid=\"new xaction guid\" type=\"ok\"></params></response></message>\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nINFO [a] - Server Response: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><response xaction_guid=\"new xaction guid\" type=\"ok\"></response></message>";
        System.out.println("INPUT:\n" + text + "\n\nCOUNT:");
        StringBuilder b = new StringBuilder();
        b.append(text);
    
        findMatch(b,create_session,"create_session");
    }
    
    private static int findMatch(StringBuilder b,String pattern, String type) {
        int count =0;  // counter for "Send message to server:"
        int countType=0; // counter for "type=\"createsession\""
        Pattern regex = Pattern.compile(pattern,Pattern.MULTILINE);
        Matcher regexMatcher = regex.matcher(b.toString());
        while (regexMatcher.find()) {
            if (regexMatcher.group("toserver") != null) {
                count++;
            } else if (regexMatcher.group("message") != null) {
                countType++;
            } else {
                // Ignoring "Store in DB :\n<?xml...."
            }
        } 
        System.out.printf("%25s%2d\n%25s%2d\n", "to server: ", count, type+": ", countType);
        return countType;
    }
    

    <强>输出

    INPUT:
    Send message to server:
    Created post notification log dir
    Created post notification log dir
    Created post notification log dir
    Send message to server:
    Created post notification log dir
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
    Store in DB :
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
    INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></params></response></message>
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
    INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></response></message>
    
    COUNT:
                  to server:  2
             create_session:  2
    

    ideone demo