我们有一个包含mac地址字段的类型。数据使用jdbc river
原因是当我们在mac_address字段上运行术语聚合时,结果看起来像是字段被分解为索引键:
动作:
GET index/type/_search?search_type=count
{
"aggs" : {
"uniqe_macs" : {
"terms" : {
"field" : "mac_address"
}
}
}
}
结果:
"aggregations": {
"uniqe_visitors": {
"buckets": [
{
"key": "00",
"doc_count": 1608759
},
{
"key": "10",
"doc_count": 674633
},
{
"key": "18",
"doc_count": 588591
},
{
"key": "f0",
"doc_count": 544897
},
{
"key": "60",
"doc_count": 538841
},
{
"key": "40",
"doc_count": 529085
},
{
"key": "08",
"doc_count": 523681
},
{
"key": "d0",
"doc_count": 515774
},
{
"key": "54",
"doc_count": 514771
},
{
"key": "04",
"doc_count": 509629
}
]
}
}
如何强制弹性来映射此字段而不是将其分解为键?
答案 0 :(得分:4)
您可以尝试在es字段mac_address
上使用以下映射,自定义分析器。
定义分析器
curl -XPUT http://localhost:9200/INDEX -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"my_edge_ngram_analyzer" : {
"tokenizer" : "my_edge_ngram_tokenizer"
}
},
"tokenizer" : {
"my_edge_ngram_tokenizer" : {
"type" : "edgeNGram",
"min_gram" : "2",
"max_gram" : "17"
}
}
}
}
}'
应用映射
curl -XPUT http://localhost:9200/INDEX/TYPE/_mapping -d '
{
"TYPE": {
"properties" {
"mac_address": {
"type": "string",
"index_analyzer" : "my_edge_ngram_analyzer",
"search_analyzer": "keyword"
}
}
}
}'
答案 1 :(得分:0)
对我来说,为mac_adress
定义原始多字段并将其设置为not_analyzed
更容易,如here所述。虽然它不适用于旧数据,但不需要使用新的分析器来改变索引。
curl -XPUT http://localhost:9200/INDEX/TYPE/_mapping -d'
{
"TYPE" : {
"properties" : {
"mac_address" : {
"type" : "string",
"fields":{
"raw" : {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
然后,对于聚合,您只需使用字段mac_address.raw
:
curl -XPOST http://localhost:9200/INDEX/TYPE/_search?search_type=count -d'
{
"aggs" : {
"unique_macs" : {
"terms" : {
"field" : "mac_address.raw"
}
}
}
}'