使用ExtractText在nifi中获取日志数据

时间:2017-08-25 12:17:21

标签: apache-nifi

我在tailFail中检索了自定义日志数据,然后分割数据(逐行)。现在我想从nifi-api.log获取有用的数据。

我使用了这样的表达式:

^(.*)$

但处理器使flowfiele无与伦比。 1.我应该如何取代我的表达?

2 个答案:

答案 0 :(得分:2)

这取决于您在日志消息中查找的信息。您发布的表达式只是匹配整个内容。

假设您有以下日志输出,并希望收集流文件存储库检查点的时间来进行分析:

2017-08-25 10:36:31,942 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 229 milliseconds
2017-08-25 10:36:35,571 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@6527aa0 checkpointed with 0 Records and 0 Swap Files in 14 milliseconds (Stop-the-world time = 4 milliseconds, Clear Edit Logs time = 7 millis), max Transaction ID -1
2017-08-25 10:38:31,942 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2017-08-25 10:38:32,162 INFO [pool-10-thread-1] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@6cca70e3 checkpointed with 0 Records and 0 Swap Files in 218 milliseconds (Stop-the-world time = 92 milliseconds, Clear Edit Logs time = 98 millis), max Transaction ID -1
2017-08-25 10:38:32,162 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 218 milliseconds
2017-08-25 10:38:35,584 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@6527aa0 checkpointed with 0 Records and 0 Swap Files in 13 milliseconds (Stop-the-world time = 6 milliseconds, Clear Edit Logs time = 4 millis), max Transaction ID -1
2017-08-25 10:40:32,161 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2017-08-25 10:40:32,341 INFO [pool-10-thread-1] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@6cca70e3 checkpointed with 0 Records and 0 Swap Files in 177 milliseconds (Stop-the-world time = 71 milliseconds, Clear Edit Logs time = 87 millis), max Transaction ID -1
2017-08-25 10:40:32,341 INFO [pool-10-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 178 milliseconds
2017-08-25 10:40:35,592 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@6527aa0 checkpointed with 0 Records and 0 Swap Files in 11 milliseconds (Stop-the-world time = 5 milliseconds, Clear Edit Logs time = 4 millis), max Transaction ID -1

使用类似^[\d\-\s\:,]+\s(INFO|WARN|ERROR).*(\d+) milliseconds的表达式可以过滤这些消息并使用捕获组,了解消息的严重性和时间。

答案 1 :(得分:1)

您可以在extractText处理器中使用以下正则表达式来提取值。

regex:(.*)

然后使用RouteOnAttribute通过以下表达式检查该日志是否为ERROR/WARN/INFO

INFO:${regex:toLower():contains('info')}

ERROR:${regex:toLower():contains('error')}

WARN:${regex:toLower():contains('warn')}

现在按照属性路由你的流文件,然后做你想做的任何事情。

希望这对你有帮助