Logstash:从数组到字符串的XML到JSON输出

时间:2015-08-07 14:27:44

标签: json xml elasticsearch logstash

我正在尝试使用Logstash将XML转换为ElasticSearch的JSON。我能够获取读取的值并将其发送到ElasticSearch。问题是所有值都以数组形式出现。我想让它们像弦乐一样出现。我知道我可以单独为每个字段做replace,但后来遇到嵌套字段深度为3级的问题。

XML

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<acs2:SubmitTestResult xmlns:acs2="http://tempuri.org/" xmlns:acs="http://schemas.sompleace.org" xmlns:acs1="http://schemas.someplace.org">
    <acs2:locationId>Location Id</acs2:locationId>
    <acs2:userId>User Id</acs2:userId>
    <acs2:TestResult>
        <acs1:CreatedBy>My Name</acs1:CreatedBy>
        <acs1:CreatedDate>2015-08-07</acs1:CreatedDate>
        <acs1:Output>10.5</acs1:Output>
    </acs2:TestResult>
</acs2:SubmitTestResult>

Logstash配置

input {
    file {
        path => "/var/log/logstash/test.xml"
    }
}
filter {
    multiline {
        pattern => "^\s\s(\s\s|\<\/acs2:SubmitTestResult\>)"
        what => "previous"
    }
    if "multiline" in [tags] {
        mutate {
            replace => ["message", '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>%{message}']
        }
        xml {
            target => "SubmitTestResult"
            source => "message"
        }
        mutate {
            remove_field => ["message", "@version", "host", "@timestamp", "path", "tags", "type"]
            remove_field => ["entry", "[SubmitTestResult][xmlns:acs2]", "[SubmitTestResult][xmlns:acs]", "[SubmitTestResult][xmlns:acs1]"]

            # This works
            replace => [ "[SubmitTestResult][locationId]", "%{[SubmitTestResult][locationId]}" ]

            # This does NOT work
            replace => [ "[SubmitTestResult][TestResult][CreatedBy]", "%{[SubmitTestResult][TestResult][CreatedBy]}" ]
        }
    }
}
output {
    stdout {
        codec => "rubydebug"
    }
    elasticsearch {
        index => "xmltest"
        cluster => "logstash"
    }
}

示例输出

{
   "_index": "xmltest",
   "_type": "logs",
   "_id": "AU8IZBURkkRvuur_3YDA",
   "_version": 1,
   "found": true,
   "_source": {
      "SubmitTestResult": {
         "locationId": "Location Id",
         "userId": [
            "User Id"
         ],
         "TestResult": [
            {
               "CreatedBy": [
                  "My Name"
               ],
               "CreatedDate": [
                  "2015-08-07"
               ],
               "Output": [
                  "10.5"
               ]
            }
         ]
      }
    }
}

如您所见,输出是每个元素的数组(我替换的locationId除外)。我试图不必为每个元素做替换。有没有办法调整配置,使输出正确?如果没有,我如何在replace深入3级?

- UPDATE -

我想出了如何在测试结果中达到第3级。替换是:

replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]

1 个答案:

答案 0 :(得分:1)

我明白了。这是解决方案。

replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]