正则表达式serde在蜂巢中读取日志文件

时间:2016-02-03 02:28:58

标签: regex linux hadoop hive

我试图在hive中创建一个正则表达式serde来读取一些日志文件,但是我遇到了让它运行起来的问题...

日志文件看起来像这样......

14.196.202.16:9123  11329   2016-01-27 17:50:26.965 -5                  Thread-14960    CCS 6104    1   Audit.rds.CCS       reportDataService       Failure <messages><message><messageString>RDS-ERR-1047 Unable to process the XML output stream. The XML is invalid.</messageString></message>   <trace>ClientAbortException:  java.net.SocketException: Broken pipe     at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369)     at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339)  at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392)     at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381)  at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89)   at java.io.BufferedOutputStream.write(Unknown Source)   at java.io.BufferedOutputStream.write(Unknown Source)   at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)  at sun.nio.cs.StreamEncoder.implWrite(Unknown Source)   at sun.nio.cs.StreamEncoder.write(Unknown Source)   at java.io.OutputStreamWriter.write(Unknown Source)     at java.io.BufferedWriter.flushBuffer(Unknown Source)   at java.io.BufferedWriter.write(Unknown Source)     at java.io.Writer.write(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.write(Unknown Source)  at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source)     at com.cognos.ccs.formats.html.AHTMLElement.writeInlineStyles(Unknown Source)   at com.cognos.ccs.formats.html.AHTMLElement.writeStyles(Unknown Source)     at com.cognos.ccs.formats.html.AHTMLTableElement.closeStartTag(Unknown Source)  at com.cognos.ccs.formats.html.HTMLLayoutTable.processEvent(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.startElement(Unknown Source)   at com.cognos.ccs.formats.CCSFormatter.startElement(Unknown Source)     at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)  at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)   at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)  at com.cognos.ccs.service.CCSDataResult$ProcessingThread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe   at java.net.SocketOutputStream.socketWrite0(Native Method)  at java.net.SocketOutputStream.socketWrite(Unknown Source)  at java.net.SocketOutputStream.write(Unknown Source)    at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:761)  at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448)     at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363)  at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:785)    at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124)   at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:598)     at org.apache.coyote.Response.doWrite(Response.java:533)    at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364)     ... 35 more </trace>

我到目前为止:

([^ ]*)\t(-|[0-9]*)\t

然后回来:

Match 1
1.  14.196.202.16:9123
2.  11329

这正确地给了我前两个......但是当我像这样添加日期时:

([^ ]*)\t(-|[0-9]*)\t([^ ]*)\t

我得到了回复:

Match 1
1.  17:50:26.965    -5                    Thread-14960    CCS    6104    1    Audit.rds.CCS        reportDataService
2.   
3.  Failure

我对正则表达式非常陌生,我正试图解决这个问题,但我遇到了麻烦......我也试图使用这个网站:

http://rubular.com/

基本上我试图让它看起来像这样:

1. 14.196.202.16:9123   
2. 11329    
3. 2016-01-27 17:50:26.965 -5
4. 
5. 
6. 
7. 
8. Thread-14960 
9. CCS  
10. 6104    
11. 1   
12. Audit.rds.CCS   
13. 
14. reportDataService   
15. 
16. Failure 
17. <messages><message><messageString>RDS-ERR-1047 Unable to process the XML output stream. The XML is invalid.</messageString></message>   
19. <trace>ClientAbortException:  java.net.SocketException: Broken pipe     at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369)     at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339)  at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392)     at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381)  at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89)   at java.io.BufferedOutputStream.write(Unknown Source)   at java.io.BufferedOutputStream.write(Unknown Source)   at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)  at sun.nio.cs.StreamEncoder.implWrite(Unknown Source)   at sun.nio.cs.StreamEncoder.write(Unknown Source)   at java.io.OutputStreamWriter.write(Unknown Source)     at java.io.BufferedWriter.flushBuffer(Unknown Source)   at java.io.BufferedWriter.write(Unknown Source)     at java.io.Writer.write(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.write(Unknown Source)  at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source)     at com.cognos.ccs.formats.html.AHTMLElement.writeInlineStyles(Unknown Source)   at com.cognos.ccs.formats.html.AHTMLElement.writeStyles(Unknown Source)     at com.cognos.ccs.formats.html.AHTMLTableElement.closeStartTag(Unknown Source)  at com.cognos.ccs.formats.html.HTMLLayoutTable.processEvent(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.startElement(Unknown Source)   at com.cognos.ccs.formats.CCSFormatter.startElement(Unknown Source)     at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)  at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)   at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)  at com.cognos.ccs.service.CCSDataResult$ProcessingThread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe   at java.net.SocketOutputStream.socketWrite0(Native Method)  at java.net.SocketOutputStream.socketWrite(Unknown Source)  at java.net.SocketOutputStream.write(Unknown Source)    at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:761)  at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448)     at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363)  at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:785)    at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124)   at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:598)     at org.apache.coyote.Response.doWrite(Response.java:533)    at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364)     ... 35 more </trace>

编辑:

所以我想我在这里正确的方向:

我现在有这个:

([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t(\w+|\S+|\s+)\t(\w+|.)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)\t

但我仍然无法将<message><trace>分组。

1 个答案:

答案 0 :(得分:1)

我让正则表达式工作......这就是我最终的目标

([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z_\S]*)\t([0-9]*)\t([0-9]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)