在Logstash中展平嵌套事件/文档

时间:2016-05-03 12:12:58

标签: xml logstash logstash-configuration

我使用Logstash解析嵌套的多行XML文档并将其转发给Elasticsearch。

这样的文件可能如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<Root_Element xmlns:ns2="some_namespace">
    <creationTime>2016-02-05T00:27:29.752Z</creationTime>
    <provider>some_provider</provider>
    <Event>
        <eventId>111999_0</eventId>
        <something_interesting some_attribute="foo" other_attribute="bar" yet_another_attribute="whatever"/>
        <eventStartTime>2016-01-22T04:00:00Z</eventStartTime>
        <eventStopTime>2016-02-19T18:00:00Z</eventStopTime>
        <location loc_attribute="fooz" other_loc_attribute="unknown" and_one_more="hooray">
            <xy lat="51.514728" lon="-0.073563" name="some_name" direction="north"/>
        </location>
        <comment language="en">Some text comment.</comment>
        <comment language="en">Some other text comment.</comment>
    </Event>
</Root_Element>

要在Logstash中阅读此文档,我使用以下配置文件:

##########
# INPUT
##########

input {

    # listen on tcp
    tcp {
        port => 9000

        # do not split events on newlines but read multiple lines at once instead
        # events start with <Event>, everything that is not <Event> or </Root_Element> belongs to the previous event
        codec => multiline {
            pattern => "(?=<Event>)(?=</Root_Element>)"
            negate => "true"
            what => "previous"
        }
    }
}

##########
# FILTER
##########

filter {

    # parse event input as Xml
    xml {
        source => "message"
        remove_namespaces => true
        store_xml => true
        target => "parsed"
    }

    # split event by Event tag
    split {
        field => "parsed[Event]"
    }

    # flatten the nested event structure on to the root level
    ruby {
        code => "

            event['parsed']['Event'].each do |key, value|
                event[key] = value[0]
            end

        "
    }

    # remove unnecessary fields from the output
    mutate {
        remove_field => ["message", "parsed", "host", "port", "tags"]
    }

}

##########
# OUTPUT
##########

output {

    # forward event to the elasticsearch host
    # elasticsearch {
        # hosts => ["elasticsearch"]
    # }

    # write event on stdout for debugging
    stdout {
        codec => rubydebug
    }
}

为了对此进行测试,只需将上面的XML内容保存到文件中,使用提供的配置启动Logstash,然后通过cat filename.xml | nc <logstash_ip_or_hostname> 9000将XML内容发送到logstash。

这导致Logstash中的以下输出:

{
               "@timestamp" => "2016-05-03T11:51:39.777Z",
                 "@version" => "1",
                  "eventId" => "111999_0",
    "something_interesting" => {
               "some_attribute" => "foo",
              "other_attribute" => "bar",
        "yet_another_attribute" => "whatever"
    },
           "eventStartTime" => "2016-01-22T04:00:00Z",
            "eventStopTime" => "2016-02-19T18:00:00Z",
                 "location" => {
              "loc_attribute" => "fooz",
        "other_loc_attribute" => "unknown",
               "and_one_more" => "hooray",
                         "xy" => [
            [0] {
                      "lat" => "51.514728",
                      "lon" => "-0.073563",
                     "name" => "some_name",
                "direction" => "north"
            }
        ]
    },
                  "comment" => {
        "language" => "en",
         "content" => "Some text comment."
    }
}

但事实并非如此,事件包含字符串(例如eventId),对象(例如something_interesting)和对象数组(例如location => xy)值。我宁愿让最终事件变得扁平而不是嵌套,因为在Elasticsearch和Kibana中处理嵌套数据有一些问题。

此外,原始XML内容有两个<comment>标签,但第二个标签由于某种原因没有进入输出。

我希望输出看起来像是:

{
    "@timestamp" => "2016-05-03T12:00:54.182Z",
    "@version" => "1",
    "eventId" => "111999_0",
    "something_interesting.some_attribute" => "foo",
    "something_interesting.other_attribute" => "bar",
    "something_interesting.yet_another_attribute" => "whatever",
    "eventStartTime" => "2016-01-22T04:00:00Z",
    "eventStopTime" => "2016-02-19T18:00:00Z",
    "location.loc_attribute" => "fooz",
    "location.other_loc_attribute" => "unknown",
    "location.and_one_more" => "hooray",
    "xy.0.lat" => "51.514728",
    "xy.0.lon" => "-0.073563",
    "xy.0.name" => "some_name",
    "xy.0.direction" => "north",
    "comment.0.language" => "en",
    "comment.0.content" => "Some text comment.",
    "comment.1.language" => "en",
    "comment.1.content" => "Some other text comment.",
}

键中的分隔符不一定是点,但可以是其他任何东西(我现在不确定是否允许点。)

有关如何实现这一目标的任何建议吗?我是否必须为此转换编写自定义Ruby插件,或者也可以使用内置的插件来完成?

0 个答案:

没有答案