ElasticSearch部分搜索匹配

时间:2013-09-07 21:03:11

标签: elasticsearch

我正在尝试使用nGrams和同义词等功能,但我没有运气。

我关注this blog post。我已经尝试将映射和查询调整到我的数据,它只会匹配确切的术语。我还尝试使用this gist中文章中的确切数据,结果相同。

以下是映射:

{
   "mappings": {
      "item": {
         "properties": {
            "productName": {
               "fields": {
                  "partial": {
                     "search_analyzer":"full_name",
                     "index_analyzer":"partial_name",
                     "type":"string"
                  },
                  "partial_back": {
                     "search_analyzer":"full_name",
                     "index_analyzer":"partial_name_back",
                     "type":"string"
                  },
                  "partial_middle": {
                     "search_analyzer":"full_name",
                     "index_analyzer":"partial_middle_name",
                     "type":"string"
                  },
                  "productName": {
                     "type":"string",
                     "analyzer":"full_name"
                  }
               },
               "type":"multi_field"
            },
            "productID": {
               "type":"string",
               "analyzer":"simple"
            },
            "warehouse": {
               "type":"string",
               "analyzer":"simple"
            },
            "vendor": {
               "type":"string",
               "analyzer":"simple"
            },
            "productDescription": {
               "type":"string",
               "analyzer":"full_name"
            },
            "categories": {
               "type":"string",
               "analyzer":"simple"
            },
            "stockLevel": {
               "type":"integer",
               "index":"not_analyzed"
            },
            "cost": {
               "type":"float",
               "index":"not_analyzed"
            }
         }
      },
      "settings": {
         "analysis": {
            "filter": {
               "name_ngrams": {
                  "side":"front",
                  "max_gram":50,
                  "min_gram":2,
                  "type":"edgeNGram"
               },
               "name_ngrams_back": {
                  "side":"back",
                  "max_gram":50,
                  "min_gram":2,
                  "type":"edgeNGram"
               },
               "name_middle_ngrams": {
                  "type":"nGram",
                  "max_gram":50,
                  "min_gram":2
               }
            },
            "analyzer": {
               "full_name": {
                  "filter":[
                     "standard",
                     "lowercase",
                     "asciifolding"
                  ],
                  "type":"custom",
                  "tokenizer":"standard"
               },
               "partial_name": {
                  "filter":[
                     "standard",
                     "lowercase",
                     "asciifolding",
                     "name_ngrams"
                  ],
                  "type":"custom",
                  "tokenizer":"standard"
               },
               "partial_name_back": {
                  "filter":[
                     "standard",
                     "lowercase",
                     "asciifolding",
                     "name_ngrams_back"
                  ],
                  "type":"custom",
                  "tokenizer":"standard"
               },
               "partial_middle_name": {
                  "filter":[
                     "standard",
                     "lowercase",
                     "asciifolding",
                     "name_middle_ngrams"
                  ],
                  "type":"custom",
                  "tokenizer":"standard"
               }
            }
         }
      }
   }
}

搜索查询(我删除了过滤器以尝试返回更多结果):

{
   "size":20,
   "from":0,
   "sort":[
      "_score"
   ],
   "query": {
      "bool": {
         "should":[
            {
               "text": {
                  "productName": {
                     "boost":5,
                     "query":"test query",
                     "type":"phrase"
                  }
               }
            },
            {
               "text": {
                  "productName.partial": {
                     "boost":1,
                     "query":"test query"
                  }
               }
            },
            {
               "text": {
                  "productName.partial_middle": {
                     "boost":1,
                     "query":"test query"
                  }
               }
            },
            {
               "text": {
                  "productName.partial_back": {
                     "boost":1,
                     "query":"test query"
                  }
               }
            }
         ]
      }
   }
}

如果我从第一个bool查询中删除以下代码,请使用gist中的上述查询

"text":{
    "productName":{
        "boost":5,
        "query":"test query",
        "type":"phrase"
    }
} 

所以它不会返回直接匹配,无论我的搜索词是什么,我仍然没有返回任何结果。

我认为我遗漏了一些明显的东西,并且不知道其他相关信息是什么,所以请放轻松我。

1 个答案:

答案 0 :(得分:5)

看起来我找到了问题的答案,盲目地复制和粘贴。我链接的博客文章似乎已过时,命令的JSON不再正常工作(但在发送命令时没有抛出错误)。

以下是创建我使用的索引的代码:

{
   "settings": {
      "analysis": {
         "filter": {
            "name_ngrams": {
               "side":"front",
               "max_gram":50,
               "min_gram":2,
               "type":"edgeNGram"
            },
            "name_ngrams_back": {
               "side":"back",
               "max_gram":50,
               "min_gram":2,
               "type":"edgeNGram"
            },
            "name_middle_ngrams": {
               "type":"nGram",
               "max_gram":50,
               "min_gram":2
            }
         },
         "analyzer": {
            "full_name": {
               "filter":[
                  "standard",
                  "lowercase",
                  "asciifolding"
               ],
               "type":"custom",
               "tokenizer":"standard"
            },
            "partial_name": {
               "filter":[
                  "standard",
                  "lowercase",
                  "asciifolding",
                  "name_ngrams"
               ],
               "type":"custom",
               "tokenizer":"standard"
            },
            "partial_name_back": {
               "filter":[
                  "standard",
                  "lowercase",
                  "asciifolding",
                  "name_ngrams_back"
               ],
               "type":"custom",
               "tokenizer":"standard"
            },
            "partial_middle_name": {
               "filter":[
                  "standard",
                  "lowercase",
                  "asciifolding",
                  "name_middle_ngrams"
               ],
               "type":"custom",
               "tokenizer":"standard"
            }
         }
      }
   },
   "mappings" : {
      "product": {
         "properties": {
            "productName": {
               "fields": {
                  "partial": {
                     "search_analyzer":"full_name",
                     "index_analyzer":"partial_name",
                     "type":"string"
                  },
                  "partial_back": {
                     "search_analyzer":"full_name",
                     "index_analyzer":"partial_name_back",
                     "type":"string"
                  },
                  "partial_middle": {
                     "search_analyzer":"full_name",
                     "index_analyzer":"partial_middle_name",
                     "type":"string"
                  },
                  "productName": {
                     "type":"string",
                     "analyzer":"full_name"
                  }
               },
               "type":"multi_field"
            },
            "productID": {
               "type":"string",
               "analyzer":"simple"
            },
            "warehouse": {
               "type":"string",
               "analyzer":"simple"
            },
            "vendor": {
               "type":"string",
               "analyzer":"simple"
            },
            "productDescription": {
               "type":"string",
               "analyzer":"full_name"
            },
            "categories": {
               "type":"string",
               "analyzer":"simple"
            },
            "stockLevel": {
               "type":"integer",
               "index":"not_analyzed"
            },
            "cost": {
               "type":"float",
               "index":"not_analyzed"
            }
         }
      }
   }
}

以下是我用来插入测试记录的代码(我用了3次稍微修改了数据)

{
    "productName": "Thingey",
    "productID": "asdfasef9816",
    "warehouse": "usa",
    "vendor": "Cool Things Inc",
    "productDescription": "This is a cool gizmo",
    "categories": "Cool Things",
    "stockLevel": 6,
    "cost": 15.31
}

最后是搜索查询的JSON。

{
   "size":20,
   "from":0,
   "sort":[
      "_score"
   ],
   "query": {
      "bool": {
         "should":[
            {
               "text": {
                  "productName.partial": {
                     "boost":1,
                     "query":"ing"
                  }
               }
            },
            {
               "text": {
                  "productName.partial_middle": {
                     "boost":1,
                     "query":"ing"
                  }
               }
            },
            {
               "text": {
                  "productName.partial_back": {
                     "boost":1,
                     "query":"ing"
                  }
               }
            }
         ]
      }
   }
}

我必须做的关键更改是将设置从映射PUT移动到索引创建。我也在这里移动了初始映射定义,但它可以使用regular / index / item / _mapping PUT创建。

如果任何ElasticSearch专业人员希望为此问题的未来读者扩展此功能,请执行此操作。