使用Logstash过滤器处理来自Kafka主题的JSON消息

时间:2016-10-19 12:26:01

标签: json elasticsearch filter apache-kafka logstash

我正在使用Logstash 2.4从Kafka主题中读取JSON消息并将它们发送到Elasticsearch索引。

JSON格式如下 -

{
   "schema":
             {
            "type": "struct",
        "fields": [
                    {
                   "type":"string",
                   "optional":false,
                   "field":"reloadID"
                },
                {
                   "type":"string",
                   "optional":false,
                   "field":"externalAccountID"
                },
                {
                   "type":"int64",
                   "optional":false,
                   "name":"org.apache.kafka.connect.data.Timestamp",
                   "version":1,
                   "field":"reloadDate"
                },
                {
                   "type":"int32",
                   "optional":false,
                   "field":"reloadAmount"
                },
                {
                   "type":"string",
                   "optional":true,
                   "field":"reloadChannel"
                }
              ],
        "optional":false,
        "name":"reload"
         },
   "payload":
             {
            "reloadID":"328424295",
        "externalAccountID":"9831200013",
        "reloadDate":1446242463000,
        "reloadAmount":240,
        "reloadChannel":"C1"
         }
}

我的配置文件中没有任何过滤器,ES索引中的目标文档如下所示 -

{
  "_index" : "kafka_reloads",
  "_type" : "logs",
  "_id" : "AVfcyTU4SyCFNFP2z5-l",
  "_score" : 1.0,
  "_source" : {
    "schema" : {
      "type" : "struct",
      "fields" : [ {
        "type" : "string",
        "optional" : false,
        "field" : "reloadID"
      }, {
        "type" : "string",
        "optional" : false,
        "field" : "externalAccountID"
      }, {
        "type" : "int64",
        "optional" : false,
        "name" : "org.apache.kafka.connect.data.Timestamp",
        "version" : 1,
        "field" : "reloadDate"
      }, {
        "type" : "int32",
        "optional" : false,
        "field" : "reloadAmount"
      }, {
        "type" : "string",
        "optional" : true,
        "field" : "reloadChannel"
      } ],
      "optional" : false,
      "name" : "reload"
    },
    "payload" : {
      "reloadID" : "155559213",
      "externalAccountID" : "9831200014",
      "reloadDate" : 1449529746000,
      "reloadAmount" : 140,
      "reloadChannel" : "C1"
    },
    "@version" : "1",
    "@timestamp" : "2016-10-19T11:56:09.973Z",
  }
}

但是,我只希望“payload”字段的值部分移动到我的ES索引作为目标JSON主体。所以我尝试在配置文件中使用'mutate'过滤器,如下所示 -

input {
   kafka {
            zk_connect => "zksrv-1:2181,zksrv-2:2181,zksrv-4:2181"
            group_id => "logstash"
            topic_id => "reload"
            consumer_threads => 3
   }
}
filter {
  mutate {
     remove_field => [ "schema","@version","@timestamp" ]
  }
}
output {
   elasticsearch {
                    hosts => ["datanode-6:9200","datanode-2:9200"]
                    index => "kafka_reloads"
   }
}

使用此过滤器,ES文档现在如下所示 -

{
      "_index" : "kafka_reloads",
      "_type" : "logs",
      "_id" : "AVfch0yhSyCFNFP2z59f",
      "_score" : 1.0,
      "_source" : {
        "payload" : {
          "reloadID" : "850846698",
          "externalAccountID" : "9831200013",
          "reloadDate" : 1449356706000,
          "reloadAmount" : 30,
          "reloadChannel" : "C1"
        }
      }
}

但实际上应该如下 -

{
      "_index" : "kafka_reloads",
      "_type" : "logs",
      "_id" : "AVfch0yhSyCFNFP2z59f",
      "_score" : 1.0,
      "_source" : {
          "reloadID" : "850846698",
          "externalAccountID" : "9831200013",
          "reloadDate" : 1449356706000,
          "reloadAmount" : 30,
          "reloadChannel" : "C1"
      }
}

有办法做到这一点吗?任何人都可以帮我吗?

我也尝试过以下过滤器 -

filter {
   json {
      source => "payload"
   }
}

但是这给了我像 -

这样的错误

解析json {:source =>“payload”时出错,:raw => {“reloadID”=>“572584696”,“externalAccountID”=>“9831200011”,“reloadDate”=> 1449093851000, “reloadAmount”=> 180,“reloadChannel”=>“C1”},:exception => java.lang.ClassCastException:org.jruby.RubyHash无法强制转换为org.jruby.RubyIO,:level => :警告}

非常感谢任何帮助。

由于 Gautam Ghosh

2 个答案:

答案 0 :(得分:5)

您可以使用以下ruby过滤器实现您想要的效果:

  ruby {
     code => "
        event.to_hash.delete_if {|k, v| k != 'payload'}
        event.to_hash.update(event['payload'].to_hash)
        event.to_hash.delete_if {|k, v| k == 'payload'}
     "
  }

它的作用是:

  1. 删除除payload一个
  2. 之外的所有字段
  3. 复制根级别的所有payload内部字段
  4. 删除payload字段本身
  5. 你最终会得到你需要的东西。

答案 1 :(得分:0)

已经有一段时间了,但是here有一个有效的解决方法,希望它会有用。

json_encode {
  source => "json"
  target => "json_string"
}

json {
  source => "json_string"
}