我正在使用Logstash 2.4从Kafka主题中读取JSON消息并将它们发送到Elasticsearch索引。
JSON格式如下 -
{
"schema":
{
"type": "struct",
"fields": [
{
"type":"string",
"optional":false,
"field":"reloadID"
},
{
"type":"string",
"optional":false,
"field":"externalAccountID"
},
{
"type":"int64",
"optional":false,
"name":"org.apache.kafka.connect.data.Timestamp",
"version":1,
"field":"reloadDate"
},
{
"type":"int32",
"optional":false,
"field":"reloadAmount"
},
{
"type":"string",
"optional":true,
"field":"reloadChannel"
}
],
"optional":false,
"name":"reload"
},
"payload":
{
"reloadID":"328424295",
"externalAccountID":"9831200013",
"reloadDate":1446242463000,
"reloadAmount":240,
"reloadChannel":"C1"
}
}
我的配置文件中没有任何过滤器,ES索引中的目标文档如下所示 -
{
"_index" : "kafka_reloads",
"_type" : "logs",
"_id" : "AVfcyTU4SyCFNFP2z5-l",
"_score" : 1.0,
"_source" : {
"schema" : {
"type" : "struct",
"fields" : [ {
"type" : "string",
"optional" : false,
"field" : "reloadID"
}, {
"type" : "string",
"optional" : false,
"field" : "externalAccountID"
}, {
"type" : "int64",
"optional" : false,
"name" : "org.apache.kafka.connect.data.Timestamp",
"version" : 1,
"field" : "reloadDate"
}, {
"type" : "int32",
"optional" : false,
"field" : "reloadAmount"
}, {
"type" : "string",
"optional" : true,
"field" : "reloadChannel"
} ],
"optional" : false,
"name" : "reload"
},
"payload" : {
"reloadID" : "155559213",
"externalAccountID" : "9831200014",
"reloadDate" : 1449529746000,
"reloadAmount" : 140,
"reloadChannel" : "C1"
},
"@version" : "1",
"@timestamp" : "2016-10-19T11:56:09.973Z",
}
}
但是,我只希望“payload”字段的值部分移动到我的ES索引作为目标JSON主体。所以我尝试在配置文件中使用'mutate'过滤器,如下所示 -
input {
kafka {
zk_connect => "zksrv-1:2181,zksrv-2:2181,zksrv-4:2181"
group_id => "logstash"
topic_id => "reload"
consumer_threads => 3
}
}
filter {
mutate {
remove_field => [ "schema","@version","@timestamp" ]
}
}
output {
elasticsearch {
hosts => ["datanode-6:9200","datanode-2:9200"]
index => "kafka_reloads"
}
}
使用此过滤器,ES文档现在如下所示 -
{
"_index" : "kafka_reloads",
"_type" : "logs",
"_id" : "AVfch0yhSyCFNFP2z59f",
"_score" : 1.0,
"_source" : {
"payload" : {
"reloadID" : "850846698",
"externalAccountID" : "9831200013",
"reloadDate" : 1449356706000,
"reloadAmount" : 30,
"reloadChannel" : "C1"
}
}
}
但实际上应该如下 -
{
"_index" : "kafka_reloads",
"_type" : "logs",
"_id" : "AVfch0yhSyCFNFP2z59f",
"_score" : 1.0,
"_source" : {
"reloadID" : "850846698",
"externalAccountID" : "9831200013",
"reloadDate" : 1449356706000,
"reloadAmount" : 30,
"reloadChannel" : "C1"
}
}
有办法做到这一点吗?任何人都可以帮我吗?
我也尝试过以下过滤器 -
filter {
json {
source => "payload"
}
}
但是这给了我像 -
这样的错误解析json {:source =>“payload”时出错,:raw => {“reloadID”=>“572584696”,“externalAccountID”=>“9831200011”,“reloadDate”=> 1449093851000, “reloadAmount”=> 180,“reloadChannel”=>“C1”},:exception => java.lang.ClassCastException:org.jruby.RubyHash无法强制转换为org.jruby.RubyIO,:level => :警告}
非常感谢任何帮助。
由于 Gautam Ghosh
答案 0 :(得分:5)
您可以使用以下ruby
过滤器实现您想要的效果:
ruby {
code => "
event.to_hash.delete_if {|k, v| k != 'payload'}
event.to_hash.update(event['payload'].to_hash)
event.to_hash.delete_if {|k, v| k == 'payload'}
"
}
它的作用是:
payload
一个payload
内部字段payload
字段本身你最终会得到你需要的东西。
答案 1 :(得分:0)
已经有一段时间了,但是here有一个有效的解决方法,希望它会有用。
json_encode {
source => "json"
target => "json_string"
}
json {
source => "json_string"
}