Question

我使用Logstash解析嵌套的多行XML文档并将其转发给Elasticsearch。

这样的文件可能如下所示：

<?xml version="1.0" encoding="UTF-8"?>
<Root_Element xmlns:ns2="some_namespace">
    <creationTime>2016-02-05T00:27:29.752Z</creationTime>
    <provider>some_provider</provider>
    <Event>
        <eventId>111999_0</eventId>
        <something_interesting some_attribute="foo" other_attribute="bar" yet_another_attribute="whatever"/>
        <eventStartTime>2016-01-22T04:00:00Z</eventStartTime>
        <eventStopTime>2016-02-19T18:00:00Z</eventStopTime>
        <location loc_attribute="fooz" other_loc_attribute="unknown" and_one_more="hooray">
            <xy lat="51.514728" lon="-0.073563" name="some_name" direction="north"/>
        </location>
        <comment language="en">Some text comment.</comment>
        <comment language="en">Some other text comment.</comment>
    </Event>
</Root_Element>

要在Logstash中阅读此文档，我使用以下配置文件：

##########
# INPUT
##########

input {

    # listen on tcp
    tcp {
        port => 9000

        # do not split events on newlines but read multiple lines at once instead
        # events start with <Event>, everything that is not <Event> or </Root_Element> belongs to the previous event
        codec => multiline {
            pattern => "(?=<Event>)(?=</Root_Element>)"
            negate => "true"
            what => "previous"
        }
    }
}

##########
# FILTER
##########

filter {

    # parse event input as Xml
    xml {
        source => "message"
        remove_namespaces => true
        store_xml => true
        target => "parsed"
    }

    # split event by Event tag
    split {
        field => "parsed[Event]"
    }

    # flatten the nested event structure on to the root level
    ruby {
        code => "

            event['parsed']['Event'].each do |key, value|
                event[key] = value[0]
            end

        "
    }

    # remove unnecessary fields from the output
    mutate {
        remove_field => ["message", "parsed", "host", "port", "tags"]
    }

}

##########
# OUTPUT
##########

output {

    # forward event to the elasticsearch host
    # elasticsearch {
        # hosts => ["elasticsearch"]
    # }

    # write event on stdout for debugging
    stdout {
        codec => rubydebug
    }
}

为了对此进行测试，只需将上面的XML内容保存到文件中，使用提供的配置启动Logstash，然后通过cat filename.xml | nc <logstash_ip_or_hostname> 9000将XML内容发送到logstash。

这导致Logstash中的以下输出：

{
               "@timestamp" => "2016-05-03T11:51:39.777Z",
                 "@version" => "1",
                  "eventId" => "111999_0",
    "something_interesting" => {
               "some_attribute" => "foo",
              "other_attribute" => "bar",
        "yet_another_attribute" => "whatever"
    },
           "eventStartTime" => "2016-01-22T04:00:00Z",
            "eventStopTime" => "2016-02-19T18:00:00Z",
                 "location" => {
              "loc_attribute" => "fooz",
        "other_loc_attribute" => "unknown",
               "and_one_more" => "hooray",
                         "xy" => [
            [0] {
                      "lat" => "51.514728",
                      "lon" => "-0.073563",
                     "name" => "some_name",
                "direction" => "north"
            }
        ]
    },
                  "comment" => {
        "language" => "en",
         "content" => "Some text comment."
    }
}

但事实并非如此，事件包含字符串（例如eventId），对象（例如something_interesting）和对象数组（例如location => xy）值。我宁愿让最终事件变得扁平而不是嵌套，因为在Elasticsearch和Kibana中处理嵌套数据有一些问题。

此外，原始XML内容有两个<comment>标签，但第二个标签由于某种原因没有进入输出。

我希望输出看起来像是：

{
    "@timestamp" => "2016-05-03T12:00:54.182Z",
    "@version" => "1",
    "eventId" => "111999_0",
    "something_interesting.some_attribute" => "foo",
    "something_interesting.other_attribute" => "bar",
    "something_interesting.yet_another_attribute" => "whatever",
    "eventStartTime" => "2016-01-22T04:00:00Z",
    "eventStopTime" => "2016-02-19T18:00:00Z",
    "location.loc_attribute" => "fooz",
    "location.other_loc_attribute" => "unknown",
    "location.and_one_more" => "hooray",
    "xy.0.lat" => "51.514728",
    "xy.0.lon" => "-0.073563",
    "xy.0.name" => "some_name",
    "xy.0.direction" => "north",
    "comment.0.language" => "en",
    "comment.0.content" => "Some text comment.",
    "comment.1.language" => "en",
    "comment.1.content" => "Some other text comment.",
}

键中的分隔符不一定是点，但可以是其他任何东西（我现在不确定是否允许点。）

有关如何实现这一目标的任何建议吗？我是否必须为此转换编写自定义Ruby插件，或者也可以使用内置的插件来完成？

在Logstash中展平嵌套事件/文档

0 个答案: