Logstash XML解析合并一个字段下的所有字段内容?

时间:2017-06-20 19:14:47

标签: elasticsearch logstash

Logstash 2.4.1

我想解析一个XML文件:

   <?xml version="1.0" encoding="ISO-8859-1"?>
    <catalog>
      <cd country="USA">
        <title>Empire Burlesque</title>
        <artist>Bob Dylan</artist>
        <price>10.90</price>
      </cd>
      <cd country="UK">
        <title>Hide your heart</title>
        <artist>Bonnie Tyler</artist>
        <price>10.0</price>
      </cd>
      <cd country="USA">
        <title>Greatest Hits</title>
        <artist>Dolly Parton</artist>
        <price>9.90</price>
      </cd>
</catalog>

我希望以这种格式输出:

  "country" => "USA" ,
     "title" => [
    [0] "Empire Burlesque"
],
    "artist" => [
    [0] "Bob Dylan"
],
     "price" => [
    [0] "10.90"
],
      "country" => "UK" ,
     "title" => [
    [0] "Hide your heart"
],
    "artist" => [
    [0] "Bonnie Tyler"
],
     "price" => [
    [0] "10.0"
],
   so on .....

但我得到的是这样的:

 "country" => [
    [0] "USA",
    [1] "UK",
    [2] "USA"
],
     "title" => [
    [0] "Empire Burlesque",
    [1] "Hide your heart",
    [2] "Greatest Hits"
],
    "artist" => [
    [0] "Bob Dylan",
    [1] "Bonnie Tyler",
    [2] "Dolly Parton"
],
     "price" => [
    [0] "10.90",
    [1] "10.0",
    [2] "9.90"
]

我的logstash配置如下:

input {
      file {
            path => "F:\logstash-2.4.0\logstash-2.4.0\bin\samplexml.xml"
            start_position => "beginning"
            sincedb_path => "NUL"
        codec => multiline {
               pattern => "^<\?cd.*\>"
               negate => true
               what => "previous"
        }
  }

}
filter {
    xml {
   source => "message"
   xpath => 
   [ 
     "/catalog/cd/@country", "country",
     "/catalog/cd/title/text()", "title",
     "/catalog/cd/artist/text()", "artist",
     "/catalog/cd/price/text()", "price"
   ]
   store_xml => false
   target => "doc"
        }
    }
output {
 stdout { codec => rubydebug }
  }

如何从上面的xml文件中获得我想要的输出?

由于

1 个答案:

答案 0 :(得分:0)

这不是您要求的,但如果您可以考虑Foopipes作为Logstash的替代方案,请在当前目录中创建一个foopipes.yml,如下所示:

pipelines:
  -
    when:
      - queue: started
    from:
      - readfile: ./input.xml
    do:
      - parsexml
      - select: $.catalog.cd[*]    # This is a Json path expression
      - map: ~
        country: "#{@country}"     # These are data binding expressions
        title: "#{title}"
        artist: "#{artist}"
        price: "#{price}"
    to:
      - log
    finally:
      - exit

开始于:

docker run -v %CD%:/project aretera/foopipes

(%CD%将替换为Windows上当前目录的绝对路径)