How to specify more than one grokCustomPatterns in Athena?

时间:2018-07-25 04:32:37

标签: logstash-grok amazon-athena

I'm trying to use Grok expressions in Athena, mostly as a tool to debug Grok expressions in AWS Glue Classifiers.

This works:

CREATE EXTERNAL TABLE example_grok (
  myColumn string
)
ROW FORMAT SERDE
 'com.amazonaws.glue.serde.GrokSerDe'
WITH SERDEPROPERTIES (
'input.format'='(%{WORD:header},%{WORD:file_type},%{GREEDYDATA:head_rest})|(%{DETAILS:det},%{WORD:icp_number},%{GREEDYDATA:det_rest})',
'input.grokCustomPatterns' = 'DETAILS DET'
)
STORED AS INPUTFORMAT
 'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
 's3://my-secret-bucket/path/';

I would like to specify several custom patterns, but the documentation doesn't have an example, and none of the delimiters that I have tried, either inside or outside of the string, have worked.

For example, these do NOT work

New line delimited (with no leading spaces, those are just for this post):

 'input.grokCustomPatterns' = 
 'POSTFIX_QUEUEID [0-9A-F]{7,12}
HEADER HDR'

As a "json" array:

'input.grokCustomPatterns' = ['POSTFIX_QUEUEID [0-9A-F]{7,12}','HEADER HDR']

With multiple entries:

'input.grokCustomPatterns'='HEADER (HDR)',
'input.grokCustomPatterns'='POSTFIX_QUEUEID [0-9A-F]{7,12}',

Any assistance is appreciated,

2 个答案:

答案 0 :(得分:0)

如果尚未找到答案。对我来说,当我从编辑器复制自定义模式时,它起作用了,每个模式都在新行上。

答案 1 :(得分:0)

AWS响应了我要求的文档改进。文字\n分隔模式。

  

要在input.grokCustomPatterns中包含多个模式条目   表达式,使用换行符(\ n)分隔它们,如下   如下:'input.grokCustomPatterns'='INSIDE_QS   ([^ \“] )\ nINSIDE_BRACKETS([^ \]] )')。

Grok Serde