Regex(tFileInputRegex)无法在Talend中解析日志文件

时间:2017-05-12 11:16:54

标签: regex logging talend

用例:需要使用tFileInputRegex组件解析talend中的日志文件。

以下是显示Talend Job

的Google云端硬盘链接

https://drive.google.com/open?id=0B70sWgu-vmdRd2ZiNTJOU2tNdDQ

正则表达式用于解析此文件:

"^"+
"([0-9]{4}\\-[0-9]{2}\\-[0-9]{2})"+" "+
"([0-9]{2}\\:[0-9]{2}\\:[0-9]{2}\\.[0-9]{3})"+" "+
"(.*?)"+" "+
"\\((.*)\\)"+" "+
"\\[(.*)\\]"+" "+
"(.*)"

以下是输入文件内容:

2017-05-09 10:18:52.743 INFO  (qtp1543727556-22) [   x:UIMATestCollection1] o.a.s.u.p.LogUpdateProcessorFactory [UIMATestCollection1]  webapp=/solr path=/update params={}{} 0 66
2017-05-09 10:18:52.745 ERROR (qtp1543727556-22) [   x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1] unknown field 'sentence'
    at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183)
    at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82)
    at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277)
    at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211)
    at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
    at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:736)
    at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
    at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:124)
    at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250)
    at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
    at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
    at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
    at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
    at org.eclipse.jetty.server.Server.handle(Server.java:534)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
    at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
    at java.lang.Thread.run(Thread.java:745)

2017-05-10 10:18:52.808 INFO  (qtp1543727556-13) [   x:UIMATestCollection1] o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

在Talend作业控制台中发生错误以下共享

Starting job Test_1 at 16:32 12/05/2017.

[statistics] connecting to socket on port 3827
[statistics] connected
[INFO ]: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
Line doesn't match:     at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183)
Line doesn't match:     at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82)
Line doesn't match:     at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277)
Line doesn't match:     at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211)
Line doesn't match:     at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
Line doesn't match:     at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
Line doesn't match:     at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
Line doesn't match:     at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955)
Line doesn't match:     at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110)
Line doesn't match:     at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:736)
Line doesn't match:     at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
Line doesn't match:     at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
Line doesn't match:     at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:124)
Line doesn't match:     at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250)
Line doesn't match:     at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
Line doesn't match:     at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
Line doesn't match:     at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
Line doesn't match:     at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
Line doesn't match:     at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
Line doesn't match:     at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
Line doesn't match:     at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
Line doesn't match:     at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
Line doesn't match:     at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
Line doesn't match:     at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
Line doesn't match:     at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
Line doesn't match:     at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
Line doesn't match:     at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
Line doesn't match:     at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
Line doesn't match:     at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
Line doesn't match:     at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
Line doesn't match:     at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
Line doesn't match:     at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
Line doesn't match:     at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
Line doesn't match:     at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
Line doesn't match:     at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
Line doesn't match:     at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
Line doesn't match:     at org.eclipse.jetty.server.Server.handle(Server.java:534)
Line doesn't match:     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
Line doesn't match:     at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
Line doesn't match:     at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
Line doesn't match:     at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
Line doesn't match:     at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
Line doesn't match:     at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
Line doesn't match:     at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
Line doesn't match:     at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
Line doesn't match:     at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
Line doesn't match:     at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
Line doesn't match:     at java.lang.Thread.run(Thread.java:745)
Line doesn't match: 
.----------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------.
|                                                                                                                                                     tLogRow_1                                                                                                                                                      |
|=---------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------=|
|Date      |Time        |Log_Level|App_Thread      |Collection                                                                                              |Message                                                                                                                                                 |
|=---------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------=|
|2017-05-09|10:18:52.743|INFO     |qtp1543727556-22|   x:UIMATestCollection1] o.a.s.u.p.LogUpdateProcessorFactory [UIMATestCollection1                      | webapp=/solr path=/update params={}{} 0 66                                                                                                             |
|2017-05-09|10:18:52.745|ERROR    |qtp1543727556-22|   x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1|unknown field 'sentence'                                                                                                                                |
|2017-05-10|10:18:52.808|INFO     |qtp1543727556-13|   x:UIMATestCollection1                                                                                |o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}|
'----------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------'

[statistics] disconnected
Job Test_1 ended at 16:32 12/05/2017. [exit code=0]

我得到的部分输出如下所示

.----------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------.
|                                                                                                                                                     tLogRow_1                                                                                                                                                      |
|=---------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------=|
|Date      |Time        |Log_Level|App_Thread      |Collection                                                                                              |Message                                                                                                                                                 |
|=---------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------=|
|2017-05-09|10:18:52.743|INFO     |qtp1543727556-22|   x:UIMATestCollection1] o.a.s.u.p.LogUpdateProcessorFactory [UIMATestCollection1                      | webapp=/solr path=/update params={}{} 0 66                                                                                                             |
|2017-05-09|10:18:52.745|ERROR    |qtp1543727556-22|   x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1|unknown field 'sentence'                                                                                                                                |
|2017-05-10|10:18:52.808|INFO     |qtp1543727556-13|   x:UIMATestCollection1                                                                                |o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}|
'----------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------'

1 个答案:

答案 0 :(得分:1)

错误说"行不匹配"因为你的正则表达式强制该行开始两个日期。 您应该使用https://regex101.com/之类的工具来测试每种不同情况的正则表达式。

经过一些测试后,您需要修改整个正则表达式以匹配其他情况。