我试图在hive中创建一个正则表达式serde来读取一些日志文件,但是我遇到了让它运行起来的问题...
日志文件看起来像这样......
14.196.202.16:9123 11329 2016-01-27 17:50:26.965 -5 Thread-14960 CCS 6104 1 Audit.rds.CCS reportDataService Failure <messages><message><messageString>RDS-ERR-1047 Unable to process the XML output stream. The XML is invalid.</messageString></message> <trace>ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89) at java.io.BufferedOutputStream.write(Unknown Source) at java.io.BufferedOutputStream.write(Unknown Source) at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source) at sun.nio.cs.StreamEncoder.implWrite(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at java.io.OutputStreamWriter.write(Unknown Source) at java.io.BufferedWriter.flushBuffer(Unknown Source) at java.io.BufferedWriter.write(Unknown Source) at java.io.Writer.write(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.write(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source) at com.cognos.ccs.formats.html.AHTMLElement.writeInlineStyles(Unknown Source) at com.cognos.ccs.formats.html.AHTMLElement.writeStyles(Unknown Source) at com.cognos.ccs.formats.html.AHTMLTableElement.closeStartTag(Unknown Source) at com.cognos.ccs.formats.html.HTMLLayoutTable.processEvent(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.startElement(Unknown Source) at com.cognos.ccs.formats.CCSFormatter.startElement(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source) at com.cognos.ccs.service.CCSDataResult$ProcessingThread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:761) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:785) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124) at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:598) at org.apache.coyote.Response.doWrite(Response.java:533) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364) ... 35 more </trace>
我到目前为止:
([^ ]*)\t(-|[0-9]*)\t
然后回来:
Match 1
1. 14.196.202.16:9123
2. 11329
这正确地给了我前两个......但是当我像这样添加日期时:
([^ ]*)\t(-|[0-9]*)\t([^ ]*)\t
我得到了回复:
Match 1
1. 17:50:26.965 -5 Thread-14960 CCS 6104 1 Audit.rds.CCS reportDataService
2.
3. Failure
我对正则表达式非常陌生,我正试图解决这个问题,但我遇到了麻烦......我也试图使用这个网站:
基本上我试图让它看起来像这样:
1. 14.196.202.16:9123
2. 11329
3. 2016-01-27 17:50:26.965 -5
4.
5.
6.
7.
8. Thread-14960
9. CCS
10. 6104
11. 1
12. Audit.rds.CCS
13.
14. reportDataService
15.
16. Failure
17. <messages><message><messageString>RDS-ERR-1047 Unable to process the XML output stream. The XML is invalid.</messageString></message>
19. <trace>ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89) at java.io.BufferedOutputStream.write(Unknown Source) at java.io.BufferedOutputStream.write(Unknown Source) at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source) at sun.nio.cs.StreamEncoder.implWrite(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at java.io.OutputStreamWriter.write(Unknown Source) at java.io.BufferedWriter.flushBuffer(Unknown Source) at java.io.BufferedWriter.write(Unknown Source) at java.io.Writer.write(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.write(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source) at com.cognos.ccs.formats.html.AHTMLElement.writeInlineStyles(Unknown Source) at com.cognos.ccs.formats.html.AHTMLElement.writeStyles(Unknown Source) at com.cognos.ccs.formats.html.AHTMLTableElement.closeStartTag(Unknown Source) at com.cognos.ccs.formats.html.HTMLLayoutTable.processEvent(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.startElement(Unknown Source) at com.cognos.ccs.formats.CCSFormatter.startElement(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source) at com.cognos.ccs.service.CCSDataResult$ProcessingThread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:761) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:785) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124) at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:598) at org.apache.coyote.Response.doWrite(Response.java:533) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364) ... 35 more </trace>
编辑:
所以我想我在这里正确的方向:
我现在有这个:
([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t(\w+|\S+|\s+)\t(\w+|.)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)\t
但我仍然无法将<message>
和<trace>
分组。
答案 0 :(得分:1)
我让正则表达式工作......这就是我最终的目标
([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z_\S]*)\t([0-9]*)\t([0-9]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)