我有一个简单的日志示例:
2017-02-02 09:58:12,764 - INFO - PRC0XK - logged in
2017-02-02 09:58:13,766 - INFO - L3J5WW - logged in
2017-02-02 09:58:14,005 - INFO - 0NKCVZ - call s2
2017-02-02 09:58:14,767 - INFO - P0QIOW - logged in
2017-02-02 09:58:15,729 - INFO - E0MVFZ - call s2
2017-02-02 09:58:16,257 - INFO - L3J5WW - call s2
2017-02-02 09:58:17,750 - INFO - PRC0XK - call s2
2017-02-02 09:58:21,908 - INFO - P0QIOW - call s2
2017-02-02 09:58:30,479 - INFO - PRC0XK - get answer from s2
2017-02-02 09:58:30,479 - INFO - PRC0XK - logged out
由"{timestamp} - {LogLevel} - {USERID} - {Action}"
等字段组成。
我希望将它用作输入并通过USERID
逐个形成动作。
稍后,我希望添加另一个以相同方式形成的日志文件,它也具有简单的修改USERID
,并通过USERID
通过两个日志收集所有操作。
我尝试使用聚合策略,但我有一些我没想到的。
我的骆驼路线是:
<route id="fileeater">
<description>
this route will eat log file and try to put guid through lot of log entry by some identifier
</description>
<from uri="file://data/in?charset=utf-8"/>
<split streaming="true">
<tokenize token="\n"/>
<to uri="log:gotlogline"/>
<aggregate strategyRef="SimpleAggregationStrategy" completionSize="4">
<correlationExpression>
<constant>true</constant>
</correlationExpression>
<log logName="LOGEater" message="this is logeater part"/>
<to uri="file://data/out"/>
</aggregate>
</split>
其中SimpleAggregationStrategy是:
import org.apache.camel.Exchange;
import org.apache.camel.processor.aggregate.AggregationStrategy;
public class SimpleAggregationStrategy implements AggregationStrategy{
@Override
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
if(oldExchange == null) {
return newExchange;
}
String oldBody = oldExchange.getIn().getBody(String.class);
String newBody = newExchange.getIn().getBody(String.class);
String body= oldBody;
if (oldBody.split(" - ")[2].equalsIgnoreCase(newBody.split(" - ")[2])){
body = oldBody + "\n" + newBody;
}
oldExchange.getIn().setBody(body);
return oldExchange;
}
}
因此,我希望记录条目并按USERID
分组:
...
2017-02-02 09:59:45,599 - INFO - NU7444 - logged in
2017-02-02 09:59:51,229 - INFO - NU7444 - call s2
2017-02-02 10:00:09,818 - INFO - NU7444 - get answer from s2
2017-02-02 10:00:09,818 - INFO - NU7444 - logged out
...
但我在outfile中只有两行:
2017-02-02 10:00:09,818 - INFO - NU7444 - get answer from s2
2017-02-02 10:00:09,818 - INFO - NU7444 - logged out
我的想法是关于聚合中的correlationExpression:
我可以使用部分日志行(拆分(&#34; - &#34;)[2]作为USERID
)通过聚合将它们绑定在一起吗?
我读了http://www.catify.com/2012/07/09/parsing-large-files-with-apache-camel/,发现按标头聚合比简单聚合更快。那么,我可以在拆分后使用行的一部分作为标题,然后通过标题收集它吗?我应该使用处理器来获取部分行(USERID
)并将其放入标题中吗?
答案 0 :(得分:0)
public class UserIDProcessor implements Processor{
public void process(Exchange exchange) throws Exception {
String input = exchange.getIn().getBody(String.class);
if (input.split(" - ").length > 2){
exchange.getIn().setHeader("LOGLEVEL", input.split(" - ")[1]);
exchange.getIn().setHeader("USERID", input.split(" - ")[2]);
}
exchange.getIn().setBody(input);
}
}
然后,我使用简单的aggrstrategy来汇总消息:
public class SimpleAggregationStrategy implements AggregationStrategy{
@Override
public Exchange aggregate(Exchange oldExchange, Exchange newExchange) {
if(oldExchange == null) {
return newExchange;
}
String oldBody = oldExchange.getIn().getBody(String.class);
String newBody = newExchange.getIn().getBody(String.class);
String body= oldBody + "\r\n" + newBody;
oldExchange.getIn().setBody(body);
return oldExchange;
}
}
使用非常简单的路线(您可以根据需要在聚合路线部分添加超时标志和完成大小):
<route id="fileeater">
<description>
this route will eat log file and try to put guid through lot of log entry by some identifier
</description>
<from uri="file://data/in?charset=utf-8&delete=false&readLock=idempotent-changed&readLockCheckInterval=5000"/>
<split streaming="true">
<tokenize token="\n"/>
<process ref="UIDProcessor"/>
<aggregate strategyRef="SimpleAggregationStrategy" completionSize="4">
<correlationExpression>
<simple>header.USERID</simple>
</correlationExpression>
<to uri="log:gotlogline"/>
<to uri="file://data/out?fileExist=append"/>
</aggregate>
</split>
</route>
此外,为了提高解析速度,您可以添加parallelProcessing="true"
标记进行拆分,并获得非常快速的结果。