我们希望利用ElasticSearch找到类似的对象。
假设我有一个包含4个字段的对象: product_name,seller_name,seller_phone,platform_id。
类似产品可以在不同平台上具有不同的产品名称和卖家名称(模糊匹配)。
虽然电话是严格的,但单一的变化可能会导致产生错误的记录(严格匹配)。
尝试创建的是一个查询:
如果我用伪代码写它,我会写一些像:
((product_name like'sene_product_name')OR(seller_name like 'some_seller_name')或(seller_phone ='some_phone'))AND(platform_id = 123)
答案 0 :(得分:0)
要对seller_phone
进行完全匹配,我正在为此字段建立索引而不使用ngram分析器以及product_name
seller_name
和PUT index111
{
"settings": {
"analysis": {
"analyzer": {
"edge_n_gram_analyzer": {
"tokenizer": "whitespace",
"filter" : ["lowercase", "ednge_gram_filter"]
}
},
"filter": {
"ednge_gram_filter" : {
"type" : "NGram",
"min_gram" : 2,
"max_gram": 10
}
}
}
},
"mappings": {
"document_type" : {
"properties": {
"product_name" : {
"type": "text",
"analyzer": "edge_n_gram_analyzer"
},
"seller_name" : {
"type": "text",
"analyzer": "edge_n_gram_analyzer"
},
"seller_phone" : {
"type": "text"
},
"platform_id" : {
"type": "text"
}
}
}
}
}
<强>映射强>
POST index111/document_type
{
"product_name":"macbok",
"seller_name":"apple",
"seller_phone":"9988",
"platform_id":"123"
}
索引文件
((product_name like 'some_product_name') OR (seller_name like 'some_seller_name') OR (seller_phone = 'some_phone')) AND (platform_id = 123)
用于跟随伪sql查询
POST index111/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"platform_id": {
"value": "123"
}
}
},
{
"bool": {
"should": [{
"fuzzy": {
"product_name": {
"value": "macbouk",
"boost": 1.0,
"fuzziness": 2,
"prefix_length": 0,
"max_expansions": 100
}
}
},
{
"fuzzy": {
"seller_name": {
"value": "apdle",
"boost": 1.0,
"fuzziness": 2,
"prefix_length": 0,
"max_expansions": 100
}
}
},
{
"term": {
"seller_phone": {
"value": "9988"
}
}
}
]
}
}]
}
}
}
弹性查询
var str = ["1 one", "2 two", "3 three", "4 four"]
var first: [String] = []
var second: [String] = []
var splited: [String] = []
for value in str {
splited = value.components(separatedBy: " ")
first.append(splited[0])
second.append(splited[1])
}
希望这有帮助