如何使用logstash解析xml并忽略全局标记?

时间:2016-02-04 14:23:46

标签: xml elasticsearch logstash

我正在使用logstash来解析我的s3存储桶中的xml并将其发送到我的elasticsearch服务器。我的所有xml都在标签中

<ServiceSales xmlns="dmoes"> 
     <ServiceSalesDetailsClosed>...</ServiceSalesDetailsClosed> 
     <ServiceSalesDetailsClosed>...</ServiceSalesDetailsClosed>
</ServicesSales>

我想忽略第一个标签“ServiceSales”,我试过:

在我的xml编解码器中使用“message.ServiceSales”作为源

xml {
   source => "message.ServiceSales"
   target => "ro_detail"
}

这样我通过ServicesSalesDetailsClosed得到我的xml divise,但事件不是解析

忽略并使用多行代码

codec => multiline {
    pattern => "<ServiceSalesDetailsClosed>"
    negate => "true"
    what => "previous"
}

除了第一个不解析的事件外,它有效。

你知道我怎么做吗?

1 个答案:

答案 0 :(得分:0)

我有类似的情况。对于这个xml:

<ROOT number="34">
    <EVENT name="hey"/>
    <EVENT name="you"/>
</ROOT>

我使用此logstash配置:

input {
  file {
    path => "/path/prueba.xml"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline {
      pattern => "<ROOT"
      negate => "true"
      what => "previous"
      auto_flush_interval => 1
    }
  }
}
filter {
  xml {
    source => "message"
    target => "xml_content"
  }
  split {
    field => "xml_content[EVENT]"
  }
  mutate {
    add_field => { "number" => "%{xml_content[number]}" }
    add_field => { "name" => "%{xml_content[EVENT][name]}" }
    remove_field => ['xml_content', 'message', 'path']
  }
}
output {
  stdout {
    codec => rubydebug
  }
}

我得到了这个输出:

{
        "number" => "34",
    "@timestamp" => 2016-12-23T12:20:35.587Z,
      "@version" => "1",
          "name" => "hey",
          "tags" => [
        [0] "multiline"
    ]
}
{
        "number" => "34",
    "@timestamp" => 2016-12-23T12:20:35.587Z,
      "@version" => "1",
          "name" => "you",
          "tags" => [
        [0] "multiline"
    ]
}