Question

我有一个简单的日志示例：

2017-02-02 09:58:12,764 - INFO - PRC0XK - logged in
2017-02-02 09:58:13,766 - INFO - L3J5WW - logged in
2017-02-02 09:58:14,005 - INFO - 0NKCVZ - call s2
2017-02-02 09:58:14,767 - INFO - P0QIOW - logged in
2017-02-02 09:58:15,729 - INFO - E0MVFZ - call s2
2017-02-02 09:58:16,257 - INFO - L3J5WW - call s2
2017-02-02 09:58:17,750 - INFO - PRC0XK - call s2
2017-02-02 09:58:21,908 - INFO - P0QIOW - call s2
2017-02-02 09:58:30,479 - INFO - PRC0XK - get answer from s2
2017-02-02 09:58:30,479 - INFO - PRC0XK - logged out

由"{timestamp} - {LogLevel} - {USERID} - {Action}"等字段组成。我希望将它用作输入并通过USERID逐个形成动作。稍后，我希望添加另一个以相同方式形成的日志文件，它也具有简单的修改USERID，并通过USERID通过两个日志收集所有操作。我尝试使用聚合策略，但我有一些我没想到的。我的骆驼路线是：

<route id="fileeater">
<description>
    this route will eat log file and try to put guid through lot of log entry by some identifier
</description>
<from uri="file://data/in?charset=utf-8"/>
<split streaming="true">
    <tokenize token="\n"/>
    <to uri="log:gotlogline"/>
    <aggregate strategyRef="SimpleAggregationStrategy" completionSize="4">
        <correlationExpression>
          <constant>true</constant>
        </correlationExpression>
        <log logName="LOGEater" message="this is logeater part"/>
        <to uri="file://data/out"/>
    </aggregate>
</split>

其中SimpleAggregationStrategy是：

import org.apache.camel.Exchange;
import org.apache.camel.processor.aggregate.AggregationStrategy;

public class SimpleAggregationStrategy implements AggregationStrategy{

@Override
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {

    if(oldExchange == null) {
        return newExchange;
    }

    String oldBody = oldExchange.getIn().getBody(String.class);
    String newBody = newExchange.getIn().getBody(String.class);
    String body= oldBody;
    if (oldBody.split(" - ")[2].equalsIgnoreCase(newBody.split(" - ")[2])){
        body = oldBody + "\n" + newBody;
    }

    oldExchange.getIn().setBody(body);

    return oldExchange;
}

}

因此，我希望记录条目并按USERID分组：

...
2017-02-02 09:59:45,599 - INFO - NU7444 - logged in 
2017-02-02 09:59:51,229 - INFO - NU7444 - call s2
2017-02-02 10:00:09,818 - INFO - NU7444 - get answer from s2
2017-02-02 10:00:09,818 - INFO - NU7444 - logged out
...

但我在outfile中只有两行：

2017-02-02 10:00:09,818 - INFO - NU7444 - get answer from s2
2017-02-02 10:00:09,818 - INFO - NU7444 - logged out

我的想法是关于聚合中的correlationExpression：

我可以使用部分日志行（拆分（＆＃34; - ＆＃34;）[2]作为USERID）通过聚合将它们绑定在一起吗？
我读了http://www.catify.com/2012/07/09/parsing-large-files-with-apache-camel/，发现按标头聚合比简单聚合更快。那么，我可以在拆分后使用行的一部分作为标题，然后通过标题收集它吗？我应该使用处理器来获取部分行（USERID）并将其放入标题中吗？

Answer 1

嗯，伙计们。好像我在玩骆驼后找到了一个解决方案。关于使用一个可以为每个日志条目设置标题的进程，就像我在评论中提到的那样：

public class UserIDProcessor implements Processor{
    public void process(Exchange exchange) throws Exception {
        String input = exchange.getIn().getBody(String.class);
        if (input.split(" - ").length > 2){
            exchange.getIn().setHeader("LOGLEVEL", input.split(" - ")[1]);
            exchange.getIn().setHeader("USERID", input.split(" - ")[2]);
        }
        exchange.getIn().setBody(input);
    }
}

然后，我使用简单的aggrstrategy来汇总消息：

public class SimpleAggregationStrategy implements AggregationStrategy{
    @Override
    public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
        if(oldExchange == null) {
            return newExchange;
        }
        String oldBody = oldExchange.getIn().getBody(String.class);
        String newBody = newExchange.getIn().getBody(String.class);
        String body= oldBody + "\r\n" + newBody;
        oldExchange.getIn().setBody(body);
        return oldExchange;
    }
}

使用非常简单的路线（您可以根据需要在聚合路线部分添加超时标志和完成大小）：

<route id="fileeater">
    <description>
        this route will eat log file and try to put guid through lot of log entry by some identifier
    </description>
    <from uri="file://data/in?charset=utf-8&amp;delete=false&amp;readLock=idempotent-changed&amp;readLockCheckInterval=5000"/>
    <split streaming="true">
        <tokenize token="\n"/>
        <process ref="UIDProcessor"/>
        <aggregate strategyRef="SimpleAggregationStrategy" completionSize="4">
            <correlationExpression>
              <simple>header.USERID</simple>
            </correlationExpression>
            <to uri="log:gotlogline"/>
            <to uri="file://data/out?fileExist=append"/>
        </aggregate>
    </split>
</route>

此外，为了提高解析速度，您可以添加parallelProcessing="true"标记进行拆分，并获得非常快速的结果。

Apache camel：通过日志

1 个答案: