logstash过滤器:将所有数组元素作为新事件

时间:2016-06-07 08:36:43

标签: ruby xml elasticsearch logstash

我正在尝试在xml解析器之后获取数组元素如下:

filter {
  xml {
      source => "message"
      target => "xmldata"
      store_xml => "false"
      xpath => ["/OMA/ESMLog/LogEntry/Index/text()","index"]
      xpath => ["/OMA/ESMLog/LogEntry/Status/text()","status"]
      xpath => ["/OMA/ESMLog/LogEntry/TimeStampRaw/text()","timestampraw"]
      xpath => ["/OMA/ESMLog/LogEntry/Description/text()","description"]
  }
        mutate { 
            remove_field => [ "message", "inxml", "xmldata" ] 
        }

    mutate {
    replace => {
            "index" => "%{[index][0]}"
        "status" => "%{[status][0]}"
            "timestampraw" => "%{[timestampraw][0]}"
            "description" => "%{[description][0]}"

     }
    }
    date {
      match => [ "timestampraw", "UNIX" ]
    }
}

正如您所看到的,我能够从数组中获取每个第一个元素,但是如何将数组中的所有元素作为新事件? 所以,我希望将每个'LogEntry'元素视为XML中的新事件。 这里有一些例子xml(来自omsa的原始xml):

<?xml version="1.0" encoding="UTF-8"?>
<OMA>
<ESMLog>
    <LogEntry>
        <Index>0</Index>
        <Status>2</Status>
        <TimeStamp>Tue Nov  3 07:22:57 2015</TimeStamp>
        <TimeStampRaw>1446535377</TimeStampRaw>
        <Description>The system board Mem2 temperature is within range.</Description>
    </LogEntry>
    <LogEntry>
        <Index>1</Index>
        <Status>3</Status>
        <TimeStamp>System Boot</TimeStamp>
        <TimeStampRaw>1446535378</TimeStampRaw>
        <Description>The system board Mem2 temperature is less than the lower warning threshold.</Description>
    </LogEntry>
    <LogEntry>
        <Index>2</Index>
        <Status>2</Status>
        <TimeStamp>Mon Nov  2 14:17:09 2015</TimeStamp>
        <TimeStampRaw>1446473829</TimeStampRaw>
        <Description>Drive 0 is installed in disk drive bay 1.</Description>
        </LogEntry>
        <LogEntry>
        <Index>3</Index>
        <Status>4</Status>
        <TimeStamp>Mon Nov  2 14:17:04 2015</TimeStamp>
        <TimeStampRaw>1446473824</TimeStampRaw>
        <Description>Drive 0 is removed from disk drive bay 1.</Description>
    </LogEntry>
    <LogEntry>
        <Index>4</Index>
        <Status>2</Status>
        <TimeStamp>Mon Nov  2 14:15:54 2015</TimeStamp>
        <TimeStampRaw>1446473754</TimeStampRaw>
        <Description>Drive 0 is installed in disk drive bay 1.</Description>
    </LogEntry>
    <LogEntry>
        <Index>5</Index>
        <Status>4</Status>
        <TimeStamp>Mon Nov  2 13:58:54 2015</TimeStamp>
        <TimeStampRaw>1446472734</TimeStampRaw>
        <Description>Drive 0 is removed from disk drive bay 1.</Description>
    </LogEntry>
    <LogEntry>
        <Index>6</Index>
        <Status>2</Status>
        <TimeStamp>Fri Feb  5 11:07:27 2010</TimeStamp>
        <TimeStampRaw>1265368047</TimeStampRaw>
        <Description>Drive 0 is installed in disk drive bay 1.</Description>
    </LogEntry>
    <LogEntry>
        <Index>7</Index>
        <Status>2</Status>
        <TimeStamp>Fri Feb  5 11:07:08 2010</TimeStamp>
        <TimeStampRaw>1265368028</TimeStampRaw>
        <Description>Drive 0 in disk drive bay 1 is operating normally.</Description>
    </LogEntry>
    <LogEntry>
        <Index>8</Index>
        <Status>4</Status>
        <TimeStamp>Fri Feb  5 11:07:07 2010</TimeStamp>
        <TimeStampRaw>1265368027</TimeStampRaw>
        <Description>Drive 0 is removed from disk drive bay 1.</Description>
    </LogEntry>
    <LogEntry>
        <Index>9</Index>
        <Status>4</Status>
        <TimeStamp>Fri Jan 29 09:33:27 2010</TimeStamp>
        <TimeStampRaw>1264757607</TimeStampRaw>
        <Description>Fault detected on drive 0 in disk drive bay 1.</Description>
    </LogEntry>
    <LogEntry>
        <Index>10</Index>
        <Status>2</Status>
        <TimeStamp>Mon Feb 25 16:14:15 2008</TimeStamp>
        <TimeStampRaw>1203956055</TimeStampRaw>
        <Description>Log cleared.</Description>
    </LogEntry>
    <NumRecords>11</NumRecords>
</ESMLog>
<ObjStatus>2</ObjStatus>
<SMStatus>0</SMStatus>
</OMA>

以下是我通过Jettro的例子制作的解决方案:

filter {
        xml {
                source => "message"
                target => "xmldata"
                store_xml => "false"
                xpath => ["/OMA/ESMLog//LogEntry","logentry"]
        }

        mutate {
            remove_field => [ "message", "inxml", "xmldata" ]
        }

        split {
                field => "[logentry]"
        }

        xml {
                source => "logentry"
                store_xml => "false"
                xpath => ["/LogEntry/Index/text()","index"]
                xpath => ["/LogEntry/Status/text()","status"]
                xpath => ["/LogEntry/TimeStampRaw/text()","timestampraw"]
                xpath => ["/LogEntry/Description/text()","description"]
        }
        mutate {
                replace => {
                "index" => "%{[index][0]}"
                "status" => "%{[status][0]}"
                "timestampraw" => "%{[timestampraw][0]}"
                "description" => "%{[description][0]}"

                }
        }
   date {
            match => [ "timestampraw", "UNIX" ]
    }
            mutate {
        remove_field => [ "logentry" , "timestampraw" ]
    }
}

分裂开始之后似乎创建了一个“循环”并从更深的部分处理所有数组。 感谢

1 个答案:

答案 0 :(得分:0)

由于您的示例有点冗长,我尝试制作一个更简单的xml,但您应该能够从中获得所需的内容。诀窍是使用拆分过滤器。在我使用的配置下面和输出。

# <result><logline><description>item 1</description></logline><logline><description>item 2</description></logline></result>
input {
    stdin{}
}
filter {
    xml {
        source => "message"
        store_xml => "false"
        xpath => ["/result/logline","loglines"]
        remove_field => [ "message", "host" ] 
    }
    split {
        field => "loglines"
    }
    xml {
        source => "loglines"
        store_xml => "false"
        xpath => ["/logline/description/text()","description"]
        remove_field => [ "loglines" ] 
    }    
}
output {
    stdout{ codec => rubydebug }
}

然后输出变为:

{
       "@version" => "1",
     "@timestamp" => "2016-06-07T09:40:35.420Z",
           "host" => "Jettros-MBP.fritz.box",
    "description" => [
        [0] "item 1"
    ]
}
{
       "@version" => "1",
     "@timestamp" => "2016-06-07T09:40:35.420Z",
           "host" => "Jettros-MBP.fritz.box",
    "description" => [
        [0] "item 2"
    ]
}

如您所见,现在有两个事件。