Cloudwatch to Elasticsearch parse/tokenize log event before push to ES

时间:2018-06-04 16:39:17

标签: elasticsearch aws-lambda amazon-cloudwatch amazon-ecs

Appreciate your help in advance.

In my scenario - Cloudwatch multiline logs needs to be shipped to elasticsearch service. ECS--awslog->Cloudwatch---using lambda--> ES Domain (Basic flow though very open to change how data is shipped from CW to ES )

I was able to solve multi-line issue using multi_line_start_pattern BUT The main issue I am experiencing now - is my logs have ODL format (following format)

[yyyy-mm-ddThh:mm:ss.SSS-Z][ProductName-Version][Log Level]
[Message ID][LoggerName][Key Value Pairs][[
Message]]

AND I will like to parse and tokenize log events before storing in ES (vs the complete log line ). For example:

[2018-05-31T11:08:49.148-0400] [glassfish 4.1] [INFO] [] [] [tid: _ThreadID=43 _ThreadName=Thread-8] [timeMillis: 1527692929148] [levelValue: 800] [[
[] INFO : (DummyApplicationFunctionJPADAO) EntityManagerFactory located under resource lookup name [null], resource name=AuthorizationPU]]

Needs to be parsed and tokenize using format

    timestamp            2018-05-31T11:08:49.148-0400 
    ProductName-Version glassfish 4.1  
    LogLevel            INFO 
    MessageID
    LoggerName 
   KeyValuePairs tid:  _ThreadID=43 _ThreadName=Thread-8
   Message           [] INFO : (DummyApplicationFunctionJPADAO) 
                    EntityManagerFactorylocated under resource lookup name 
                    [null], resource name=AuthorizationPU

In above Key Value pairs repeat and are variable - for simplicity I can store all as one long string.

As far as what I gathered about Cloudwatch - It seems Subscription Filter Pattern reg ex support is very limited really not sure how to fit the above pattern. For lambda function that pushes the data to ES have not seen AWS doc or examples that support lambda as means to parse and push for ES.

Will appreciate if someone can please guide what/where will be best option to parse CW logs before it gets into ES => Subscription Filter -Pattern vs in lambda function or any other way.

Thank you .

1 个答案:

答案 0 :(得分:0)

根据我的最佳选择,就是您的建议,一个CloudWatch日志触发了lambda,它将记录的数据重新格式化为ES首选格式,然后将其发布到ES。

您需要将此lambda订阅到CloudWatch日志。您可以在lambda控制台或cloudwatch控制台(https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Subscriptions.html)上进行此操作。

lambda的event有效负载为:{ "awslogs": { "data": "encoded-logs" } }。其中encoded-logs是gzip JSON的Base64编码。

例如,示例事件(https://docs.aws.amazon.com/lambda/latest/dg/eventsources.html#eventsources-cloudwatch-logs)可以在节点中解码,例如,使用:

const zlib = require('zlib');
const data = event.awslogs.data;
const gzipped = Buffer.from(data, 'base64');
const json = zlib.gunzipSync(gzipped);
const logs = JSON.parse(json);
console.log(logs);
/*
  { messageType: 'DATA_MESSAGE',
    owner: '123456789123',
    logGroup: 'testLogGroup',
    logStream: 'testLogStream',
    subscriptionFilters: [ 'testFilter' ],
    logEvents:
     [ { id: 'eventId1',
         timestamp: 1440442987000,
         message: '[ERROR] First test message' },
       { id: 'eventId2',
         timestamp: 1440442987001,
         message: '[ERROR] Second test message' } ] }
*/

根据您所概述的内容,您将想要提取logEvents数组,并将其解析为字符串数组。如果您需要它,我也很乐意为此提供帮助(但我需要知道您正在使用哪种语言编写lambda-有用于标记ODL的库-希望它不太难)。

此时,您可以将这些新记录直接POST放入您的AWS ES域中。 S3-to-ES指南有点类似地概述了如何在python中执行此操作:https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-aws-integrations.html#es-aws-integrations-s3-lambda-es

您可以在以下位置找到一个完整的lambda示例(由他人执行):https://github.com/blueimp/aws-lambda/tree/master/cloudwatch-logs-to-elastic-cloud