我正在尝试从自定义(node.js)应用程序中获取日志记录,该应用程序将数据置于弹性搜索中,然后由Kibana处理。我的环境是带有ELK的Ubuntu(Elasticsearch,Logstash和Kibana),日志生成应用程序在Node.JS中
我已经在处理标准的系统日志文件,比如syslog,nginx。 ELK环境和应用程序位于不同的服务器上
由于这是我们的应用程序创建的日志文件,因此它包含具有各种模式的条目。但每个条目都以一个公共标题开头 [示例 - 2015-03-17T11:26:27.285Z(INFO | dev3))创建交互文档...] 是的 [日期+时间戳(msg-level | system-ID)一些消息文本]
这通常是整个日志条目。但有时它可以跟随JSON对象。根据生成的消息,它可能具有不同的JSON对象。如果包含一个JSON对象(从下一行开始),该行将以“...”(减去引号)结束,但并非所有以该方式结束的行都有一个JSON对象。
作为第一步,我将引入整个多行JSON对象作为消息的一部分。现在我正在使用syslog过滤器,每行都作为单独的消息进入。然后我的最终目标是解析JSON对象并将包含的字段分开存储,以便Kibana可以干净地过滤它们各自的值。
从目前为止我所看到的有两种方法可以做到。
我的第一个问题是从长远来看哪种方法最灵活?创建多行过滤器并将JSON对象作为单个消息导入可能是最快的。但是,如果直接写入弹性搜索,则可以更容易地引入不同的JSON对象,并使过滤器可以使用单个字段,这可能是我的长期目标。
我在下面包含一些虚拟样本日志数据,以显示我要处理的内容
由于
2015-03-17T11:26:27.285Z (INFO|dev3) Creating interaction document...
{ req_url: '/nL4sWsw',
company_cookie: '68d1dc4a32ed3bfd22c96a6e60a132924e5d8fa8',
browsing_cookie: '68d1dc4a32ed3bfd22c96a6e60a132924e5d8fa8',
visit_count: 1,
campaign_id: 52d6ab20bbc1e6ac0500032f,
switchboard_id: 54888c6ffc4ac2cb18a3b8c6,
content_key: '2d0515120561b7be80c936027f6dce71b41a0391',
http_header:
{ 'x-host': 'subdomain.company.org',
'x-ip': '111.222.333.444',
host: 'logic9',
connection: 'close',
'user-agent': 'Mozilla/5.0 (compatible; ext-monitor - premium monitoring service; http://www.ext-monitor.com)' },
timestamp: Tue Mar 17 2015 06:26:27 GMT-0500 (CDT),
url: 'https://cdn.company.org/2d0515120561b7be80c936027f6dce71b41a0391/',
type7id: 'nL4sWsw',
pid: undefined,
onramp_type: 'type7',
http_user_agent: 'Other',
http_browser: 'Other' }
2015-03-17T11:26:27.285Z (INFO|dev3) Inserting interactions data...
{ 'statistics.total_interactions': 1,
'statistics.day_of_week.tuesday': 1,
'statistics.onramp_type.type7': 1,
'statistics.hour_of_day.11': 1,
'statistics.operating_systems.other': 1,
'statistics.browser_types.other': 1 }
2015-03-17T11:26:27.286Z (INFO|dev3) Updating campaign 52d6ab20bbc1e6ac0500032f with stats {"statistics.total_interactions":1,"statistics.day_of_week.tuesday":1,"statistics.onramp_type.type7":1,"statistics.hour_of_day.11":1,"statistics.operating_systems.other":1,"statistics.browser_types.other":1} ...
2015-03-17T11:26:27.286Z (INFO|dev3) Redirecting to https://cdn.company.org/2d0515120561b7be80c936027f6dce71b41a0391/ ...
2015-03-17T11:26:27.286Z (INFO|dev3) Campaign statistics recorded successfully
2015-03-17T11:26:27.287Z (INFO|dev3) GET /zVoxiPV
2015-03-17T11:26:27.287Z (INFO|dev3) GET /vumkm3A
2015-03-17T11:26:27.287Z (INFO|dev3) Starting response for type7v1 ...
2015-03-17T11:26:27.287Z (INFO|dev3) Header: {"x-host":"subdomain.company.org","x-ip":"111.222.333.444","host":"logic9","connection":"close","user-agent":"Mozilla/5.0 (compatible; ext-monitor - premium monitoring service; http://www.ext-monitor.com)"}
2015-03-17T11:26:27.287Z (INFO|dev3) Params: {"tid":"zVoxiPV"}
2015-03-17T11:26:27.287Z (INFO|dev3) Sending taIdentity cookie: f79b8ceca66f99608fb1291ab51d65b08fa3138f ...
2015-03-17T11:26:27.287Z (INFO|dev3) Sending taBrowse cookie: f79b8ceca66f99608fb1291ab51d65b08fa3138f ...
2015-03-17T11:26:27.287Z (INFO|dev3) Sending new cookie: 96ec5414d0b847790f58a1feee2399d282cf7907 with visit count 1 ...
2015-03-17T11:26:27.288Z (INFO|dev3) Finding in switchboard {"active":true,"campaign.start_at":{"$lte":"2015-03-17T11:26:27.287Z"},"campaign.end_at":{"$gte":"2015-03-17T11:26:27.287Z"},"type7id":"zVoxiPV"}
2015-03-17T11:26:27.288Z (INFO|dev3) Starting response for type7v1 ...
2015-03-17T11:26:27.288Z (INFO|dev3) Header: {"x-host":"subdomain.company.org","x-ip":"111.222.333.444","host":"logic9","connection":"close","user-agent":"Mozilla/5.0 (compatible; ext-monitor - premium monitoring service; http://www.ext-monitor.com)"}
2015-03-17T11:26:27.288Z (INFO|dev3) Params: {"tid":"vumkm3A"}
2015-03-17T11:26:27.288Z (INFO|dev3) Sending taIdentity cookie: adec72a656ef7999d101edc7e1e9cf901e1e56c9 ...
2015-03-17T11:26:27.288Z (INFO|dev3) Sending taBrowse cookie: adec72a656ef7999d101edc7e1e9cf901e1e56c9 ...
2015-03-17T11:26:27.288Z (INFO|dev3) Sending new cookie: 0c1354b30bf261595bf24a14c2e90ecac64545ed with visit count 1 ...
2015-03-17T11:26:27.288Z (INFO|dev3) Finding in switchboard {"active":true,"campaign.start_at":{"$lte":"2015-03-17T11:26:27.288Z"},"campaign.end_at":{"$gte":"2015-03-17T11:26:27.288Z"},"type7id":"vumkm3A"}
2015-03-17T11:26:27.289Z (INFO|dev3) Finding in matching set [object Object]
2015-03-17T11:26:27.289Z (INFO|dev3) Switchboard item {"_id":"5488a7ea60c5508693bebba7","content_provider":"redirect","content":{"_id":"54b8954eca0ca5eb87cb4fef","name":"Content for Switchboard 5488a7ea60c5508693bebba7","key":"ad354806eadd0f90ef55b1ab96a8c84272401186"},"type":"redirect","campaign":{"end_at":"2018-12-11T00:00:00.000Z","start_at":"2008-12-11T00:00:00.000Z","_id":"52a9dd9bfb9c94150600032f"}}
2015-03-17T11:26:27.289Z (INFO|dev3) No url for redirect, going local...
2015-03-17T11:26:27.289Z (INFO|dev3) url: https://cdn.company.org/ad354806eadd0f90ef55b1ab96a8c84272401186/
2015-03-17T11:26:27.289Z (INFO|dev3) Sending redirect to https://cdn.company.org/ad354806eadd0f90ef55b1ab96a8c84272401186/ ...
2015-03-17T11:26:27.289Z (INFO|dev3) Creating interaction document...
{ req_url: '/zVoxiPV',
company_cookie: 'f79b8ceca66f99608fb1291ab51d65b08fa3138f',
browsing_cookie: 'f79b8ceca66f99608fb1291ab51d65b08fa3138f',
visit_count: 1,
campaign_id: 52a9dd9bfb9c94150600032f,
switchboard_id: 5488a7ea60c5508693bebba7,
content_key: 'ad354806eadd0f90ef55b1ab96a8c84272401186',
http_header:
{ 'x-host': 'subdomain.company.org',
'x-ip': '111.222.333.444',
host: 'logic9',
connection: 'close',
'user-agent': 'Mozilla/5.0 (compatible; ext-monitor - premium monitoring service; http://www.ext-monitor.com)' },
timestamp: Tue Mar 17 2015 06:26:27 GMT-0500 (CDT),
url: 'https://cdn.company.org/ad354806eadd0f90ef55b1ab96a8c84272401186/',
type7id: 'zVoxiPV',
pid: undefined,
onramp_type: 'type7',
http_user_agent: 'Other',
http_browser: 'Other' }
答案 0 :(得分:0)
忘记创建一个将日志写入elasticsearch的应用程序,你只需重新发明轮子。 Logstash可以做到这一点,你只需要阅读一下如何让它做你想做的事情。当您将json编码的消息传递给logstash中的json过滤器时,它将获取键值对,并且一旦发送到elasticsearch,数据将被索引和搜索。
我建议您首先需要做一些事情,放入mulitline过滤器以将json编码数据放到同一行。我只使用多行过滤器重新加入具有一个识别功能的行,您可以使用该功能在行的开头/结尾匹配。在你的情况下,我看不到一个,但我认为你可以将2个多线过滤器链接在一起:
filter {
multiline {
#this one will look for any line starting with whitespace and join it to the previous line
what => "previous"
pattern => "^\s"
}
multiline {
#this one will look for any line starting with { and join it to the previous line
what => "previous"
pattern => "^\{"
}
}
在多行过滤器后我会使用grok过滤器,这可以用来拉出日期和消息的任何其他部分,你应该能够使用它来捕获json编码的部分然后,您可以通过json过滤器将其捕获到一个字段中来运行。
答案 1 :(得分:0)
我对logstash和多行过滤器有很多经验,我可以告诉你,当出现问题时,它非常脆弱且很难调试。
如果删除所有换行符并确保它是正确的json,logstash可以毫无问题地摄取json。所以我的建议是确保应用程序以一种易于使用logstash的方式编写json,特别是如果这是一个自定义应用程序。