从日志文件中提取Regex直到序列

时间:2017-01-22 10:33:28

标签: javascript java regex logging

我有以下日志文​​件,我需要使用正则表达式定义日志格式,以便我可以使用它来提取日志条目。

_20131005_022047874 ALEPO@ALEPO3 **Exception ServiceConnection / createService methord javax.xml.ws.WebServiceException: Failed to access the WSDL at: http://212.118.158.21:8080/tunnel-web/axis/Portlet_ase_FunctionalDomainService?wsdl. It failed with: 
    Connection refused.
    at com.sun.xml.internal.ws.wsdl.parser.RuntimeWSDLParser.tryWithMex(RuntimeWSDLParser.java:151)
    at com.sun.xml.internal.ws.wsdl.parser.RuntimeWSDLParser.parse(RuntimeWSDLParser.java:133)
    at com.sun.xml.internal.ws.client.WSServiceDelegate.parseWSDL(WSServiceDelegate.java:254)
    at com.sun.xml.internal.ws.client.WSServiceDelegate.<init>(WSServiceDelegate.java:217)
    at com.sun.xml.internal.ws.client.WSServiceDelegate.<init>(WSServiceDelegate.java:165)
    at com.sun.xml.internal.ws.spi.ProviderImpl.createServiceDelegate(ProviderImpl.java:93)
    at javax.xml.ws.Service.<init>(Service.java:56)
    at javax.xml.ws.Service.create(Service.java:680)
    at com.stc.alepo.client.ServiceConnection.createService(ServiceConnection.java:75)
    at com.stc.alepo.client.WSSoapHandler.<init>(WSSoapHandler.java:73)
    at com.stc.alepo.client.WSProcessManager.<init>(WSProcessManager.java:114)
    at com.stc.alepo.client.IcmsAlepoRealTime.start(IcmsAlepoRealTime.java:439)
    at com.stc.alepo.client.IcmsAlepoRealTime.main(IcmsAlepoRealTime.java:97)
Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
    at java.net.Socket.connect(Socket.java:529)
    at java.net.Socket.connect(Socket.java:478)
    at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
    at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
    at sun.net.www.http.HttpClient.New(HttpClient.java:306)
    at sun.net.www.http.HttpClient.New(HttpClient.java:323)
    at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
    at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
    at java.net.URL.openStream(URL.java:1010)
    at com.sun.xml.internal.ws.wsdl.parser.RuntimeWSDLParser.createReader(RuntimeWSDLParser.java:793)
    at com.sun.xml.internal.ws.wsdl.parser.RuntimeWSDLParser.resolveWSDL(RuntimeWSDLParser.java:251)
    at com.sun.xml.internal.ws.wsdl.parser.RuntimeWSDLParser.parse(RuntimeWSDLParser.java:118)
    ... 11 more

_20131005_022047874 ALEPO@ALEPO3 **Exception DCPSoapHandler / constructor methord [Ljava.lang.StackTraceElement;@25b65b7f
_20131005_022047875 ALEPO@ALEPO3 WS17249866 **Exception DCPSoapHandler / invokeSOAPMessage methord java.lang.NullPointerException
    at com.stc.alepo.client.WSSoapHandler.invokeSOAPMessage(WSSoapHandler.java:110)
    at com.stc.alepo.client.WSProcessManager.getWSReply(WSProcessManager.java:174)
    at com.stc.alepo.client.IcmsAlepoRealTime.start(IcmsAlepoRealTime.java:441)
    at com.stc.alepo.client.IcmsAlepoRealTime.main(IcmsAlepoRealTime.java:97)

我已经定义了以下正则表达式以匹配每个条目的第一行之外的时间戳,但是我需要第二个组以使消息的其余部分包括多行,

(_\d{1,8}_\w+) (.*)

如何匹配第二组以提取所有字符,直到第一组再次出现,或者执行此用例的最佳做法是什么。我有很多日志,我需要以相同的方式定义第二组,可能是时间戳格式化将改变日志。

提前感谢。

1 个答案:

答案 0 :(得分:0)

您可以使用将时间戳捕获到1组的正则表达式以及不以时间戳模式开头的所有行到第2组:

/^(_\d{1,8}_\w+)\s*(.*(?:\r?\n(?!_\d{1,8}_\w+).*)*)/gm

请参阅regex demo

<强>详情:

  • ^ - 开始行
  • (_\d{1,8}_\w+) - 第1组(时间戳):_,1到8位数字,_和1 +字字符
  • \s* - 0+ whitespaces
  • (.*(?:\r?\n(?!_\d{1,8}_\w+).*)*) - 第2组(所有到下一个时间戳):
    • .* - 除了换行符之外的任何0 +字符
    • (?:\r?\n(?!_\d{1,8}_\w+).*)* - 0+序列:
      • \r?\n(?!_\d{1,8}_\w+) - 没有后跟时间戳模式的换行符
      • .* - 除了换行符之外的任何0 +字符