用例:需要使用tFileInputRegex组件解析talend中的日志文件。
以下是显示Talend Job
的Google云端硬盘链接https://drive.google.com/open?id=0B70sWgu-vmdRd2ZiNTJOU2tNdDQ
正则表达式用于解析此文件:
"^"+
"([0-9]{4}\\-[0-9]{2}\\-[0-9]{2})"+" "+
"([0-9]{2}\\:[0-9]{2}\\:[0-9]{2}\\.[0-9]{3})"+" "+
"(.*?)"+" "+
"\\((.*)\\)"+" "+
"\\[(.*)\\]"+" "+
"(.*)"
以下是输入文件内容:
2017-05-09 10:18:52.743 INFO (qtp1543727556-22) [ x:UIMATestCollection1] o.a.s.u.p.LogUpdateProcessorFactory [UIMATestCollection1] webapp=/solr path=/update params={}{} 0 66
2017-05-09 10:18:52.745 ERROR (qtp1543727556-22) [ x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1] unknown field 'sentence'
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183)
at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82)
at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:736)
at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:124)
at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)
2017-05-10 10:18:52.808 INFO (qtp1543727556-13) [ x:UIMATestCollection1] o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
在Talend作业控制台中发生错误以下共享
Starting job Test_1 at 16:32 12/05/2017.
[statistics] connecting to socket on port 3827
[statistics] connected
[INFO ]: org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
Line doesn't match: at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:183)
Line doesn't match: at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82)
Line doesn't match: at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:277)
Line doesn't match: at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211)
Line doesn't match: at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
Line doesn't match: at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
Line doesn't match: at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
Line doesn't match: at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:955)
Line doesn't match: at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1110)
Line doesn't match: at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:736)
Line doesn't match: at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
Line doesn't match: at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
Line doesn't match: at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:124)
Line doesn't match: at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:250)
Line doesn't match: at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
Line doesn't match: at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
Line doesn't match: at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
Line doesn't match: at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
Line doesn't match: at org.apache.solr.core.SolrCore.execute(SolrCore.java:2306)
Line doesn't match: at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
Line doesn't match: at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
Line doesn't match: at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
Line doesn't match: at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
Line doesn't match: at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
Line doesn't match: at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
Line doesn't match: at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
Line doesn't match: at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
Line doesn't match: at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
Line doesn't match: at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
Line doesn't match: at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
Line doesn't match: at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
Line doesn't match: at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
Line doesn't match: at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
Line doesn't match: at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
Line doesn't match: at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
Line doesn't match: at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
Line doesn't match: at org.eclipse.jetty.server.Server.handle(Server.java:534)
Line doesn't match: at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
Line doesn't match: at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
Line doesn't match: at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
Line doesn't match: at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
Line doesn't match: at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
Line doesn't match: at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
Line doesn't match: at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
Line doesn't match: at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
Line doesn't match: at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
Line doesn't match: at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
Line doesn't match: at java.lang.Thread.run(Thread.java:745)
Line doesn't match:
.----------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------.
| tLogRow_1 |
|=---------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------=|
|Date |Time |Log_Level|App_Thread |Collection |Message |
|=---------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------=|
|2017-05-09|10:18:52.743|INFO |qtp1543727556-22| x:UIMATestCollection1] o.a.s.u.p.LogUpdateProcessorFactory [UIMATestCollection1 | webapp=/solr path=/update params={}{} 0 66 |
|2017-05-09|10:18:52.745|ERROR |qtp1543727556-22| x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1|unknown field 'sentence' |
|2017-05-10|10:18:52.808|INFO |qtp1543727556-13| x:UIMATestCollection1 |o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}|
'----------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------'
[statistics] disconnected
Job Test_1 ended at 16:32 12/05/2017. [exit code=0]
我得到的部分输出如下所示
.----------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------.
| tLogRow_1 |
|=---------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------=|
|Date |Time |Log_Level|App_Thread |Collection |Message |
|=---------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------=|
|2017-05-09|10:18:52.743|INFO |qtp1543727556-22| x:UIMATestCollection1] o.a.s.u.p.LogUpdateProcessorFactory [UIMATestCollection1 | webapp=/solr path=/update params={}{} 0 66 |
|2017-05-09|10:18:52.745|ERROR |qtp1543727556-22| x:UIMATestCollection1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR: [doc=1|unknown field 'sentence' |
|2017-05-10|10:18:52.808|INFO |qtp1543727556-13| x:UIMATestCollection1 |o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}|
'----------+------------+---------+----------------+--------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------'
答案 0 :(得分:1)
错误说"行不匹配"因为你的正则表达式强制该行开始两个日期。 您应该使用https://regex101.com/之类的工具来测试每种不同情况的正则表达式。
经过一些测试后,您需要修改整个正则表达式以匹配其他情况。