Flume Regex过滤拦截器未按预期工作

时间:2013-11-19 15:13:21

标签: regex apache log4j flume

我正在尝试实现一个简单的Flume测试应用程序,例如来自Flume User Guide的应用程序,除了我想使用log4j作为源并接受与某些正则表达式匹配的日志。所以我已经实现了一个随机日志生成器并配置了log4j和flume,如下所示:

log4j.properties

# Root logger option
log4j.rootLogger=ALL, stdout
#
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L -     %m%n
#
log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.Hostname = localhost
log4j.appender.flume.Port = 41414

# configure a class's logger to output to the flume appender
log4j.logger.generator.LogGenerator = INFO,flume

log4j.appender.flume.layout=org.apache.log4j.PatternLayout
log4j.appender.flume.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

secondExample.conf

# Flume test file
# Listens via Avro RPC on port 41414 and dumps data received to the log

agent.channels = ch-1
agent.sources = src-1
agent.sinks = sink-1

agent.channels.ch-1.type = memory
agent.channels.ch-1.capacity = 10000000
agent.channels.ch-1.transactionCapacity = 1000

agent.sources.src-1.type = avro
agent.sources.src-1.channels = ch-1
agent.sources.src-1.bind = localhost
agent.sources.src-1.port = 41414

agent.sources.src-1.interceptors = intrcptr
agent.sources.src-1.interceptors.intrcptr.type = regex_filter
agent.sources.src-1.interceptors.intrcptr.regex = "ERROR [0-4]:"

agent.sinks.sink-1.type = logger
agent.sinks.sink-1.channel = ch-1

样本生成日志:

2013-11-18 15:27:06 ERROR LogGenerator:33 - ERROR 3: 68290a60-8c25-494d-9d0d-4361a01f065f
2013-11-18 15:27:35 WARN  LogGenerator:33 - ERROR 2: 154c4779-ad6a-4b10-9ba7-199a02ad7554
2013-11-18 15:28:49 WARN  LogGenerator:33 - ERROR 5: a2a94b78-e387-4937-b6b3-c480e2c7ea76
2013-11-18 15:29:35 FATAL LogGenerator:33 - ERROR 6: 49baaa6b-19cb-48c8-9f92-b7a75f8d04dc

问题是Regex Filtering Interceptor不会排除任何事件,而logger sink会记录所有生成的日志消息。我找到了this source code,并用生成的日志和稍微修改过的拦截方法编写了一个小测试(因此它接受并返回字符串,而不是事件)并且它按预期工作。

我现在真的很困惑,并且倾向于认为这是一个Flume错误。 任何帮助将不胜感激。

P.S。我正在使用“apache-flume-1.4.0-bin”水槽。

1 个答案:

答案 0 :(得分:1)

这似乎是Flume issue。如果Flume配置如下,则可重现:

Log4jAppender -> Avro source -> Regex interceptor -> Logger Sink

解决方法是使用两个代理配置Flume:

Log4jAppender -> Avro source -> Avro sink -> Avro source -> Regex interceptor -> Logger Sink