我目前正在建立基于php和mysql系统的联系人搜索。应该可搜索的所有数据都会同步到elasticsearch中,文档如下所示:
{
"_index": "persons",
"_type": "person",
"_id": "705",
"_version": 1,
"_score": 1,
"_source": {
"firstname": "Jaida",
"lastname": "Walter",
"id": 705,
"title": "Miss",
"nickname": "",
"gender": "female",
"birthday": "1992-12-29",
"companies": [
"Mann, Bailey and Hills"
,
"West PLC"
,
"Keebler-Howe"
,
"Hills LLC"
,
"Toy, Gusikowski and Mohr"
,
"Halvorson-Fadel"
,
"Ratke PLC"
,
"Turcotte-Franecki"
,
"Bernier-Flatley"
,
"Wisozk, Bernhard and Osinski"
],
"emailAddresses": [ ],
"addresses": [
"Doe Street 47, 07691 JaneVille"
,
"Doe Street 78, 84294 JaneVille"
,
"Doe Street 37, 31698 JohnVille"
,
"Doe Street 54, 62462 JaneVille"
,
"Doe Street 31, 37672 JohnVille"
],
"phoneNumbers": [ ]
}
}
现在我想允许用户搜索所有这些字段,并在他/她的查询中使用部分单词。我对这个主题进行了研究,发现了ngram过滤器。我尝试使用文档和一些博客文章来实现所有内容。我目前使用以下索引设置:
$params = [
'index' => SearchPerson::getIndex(),
'body' => [
'settings' => [
"analysis" => [
"analyzer" => [
"ngram_analyzer" => [
"type" => "custom",
"tokenizer" => "whitespace",
"filter" => ["asciifolding", "lowercase", "ngram"]
],
"whitespace_analyzer" => [
"type" => "custom",
"tokenizer" => "whitespace",
"filter" => [
"lowercase",
"asciifolding"
]
]
],
"filter" => [
"ngram" => [
"type" => "ngram",
"min_gram" => 2,
"max_gram" => 20,
"token_chars" => [
"letter",
"digit",
"punctuation",
"symbol"
]
]
]
]
],
'mapping' => [
'person' => [
'_all' => [
'analyzer' => 'ngram_analyzer',
'search_analyzer' => 'whitespace_analyzer'
],
'properties' => [
'firstname' => [
'analyzer' => 'ngram_analyzer',
'search_analyzer' => 'whitespace_analyzer'
]
]
]
]
]
];
当我查询索引时,我总是得到相同的文档作为回报而不是我想要找到的文档。
修改1:
这是我的疑问:
{
"size": 10,
"query": {
"match": {
"_all": {
"query": "Terill",
"minimum_should_match": "100%"
}
}
}
}
我还尝试查询不同的字段(例如firstname)并删除" minimum_should_match"。
我得到了以下文件:
{
"_index": "persons",
"_type": "person",
"_id": "701",
"_version": 1,
"_score": 1,
"_source": {
"firstname": "Austen",
"lastname": "Braun",
"id": 701,
"title": "Mr.",
"nickname": "",
"gender": "male",
"birthday": "2008-05-15",
"companies": [
"Abshire, Fadel and Kiehn"
,
"Wolf-Bogan"
,
"Kohler-Langosh"
,
"Howe, Skiles and Boyer"
,
"Rippin, Batz and Ondricka"
,
"Gislason-Kirlin"
],
"emailAddresses": [ ],
"addresses": [
"Doe Street 68, 28012 JaneVille"
,
"Doe Street 12, 78992 JohnVille"
,
"Doe Street 23, 75805 JaneVille"
,
"Doe Street 95, 46066 JohnVille"
,
"Doe Street 72, 28754 JohnVille"
],
"phoneNumbers": [ ]
}
}
但是通过上面的搜索查询,我正在寻找这个:
{
"_index": "persons",
"_type": "person",
"_id": "712",
"_version": 1,
"_score": 1,
"_source": {
"firstname": "Terrill",
"lastname": "Parker",
"id": 712,
"title": "Mr.",
"nickname": "",
"gender": "male",
"birthday": "1970-07-12",
"companies": [
"Abshire, Fadel and Kiehn"
,
"Zemlak Ltd"
,
"McDermott, Schuppe and Mayer"
,
"Ledner, Rosenbaum and Maggio"
,
"Schoen PLC"
,
"Hills LLC"
,
"Tromp, Abernathy and Kuvalis"
,
"Toy, Gusikowski and Mohr"
,
"Hyatt-Flatley"
,
"Rippin, Batz and Ondricka"
,
"Jenkins-Corwin"
,
"Ratke PLC"
],
"emailAddresses": [
"john66@doe.com"
,
"pdoe@doe.info"
],
"addresses": [
"Doe Street 31, 35283 JaneVille"
,
"Doe Street 78, 46721 JohnVille"
,
"Doe Street 48, 39953 JohnVille"
,
"Doe Street 24, 31388 JohnVille"
,
"Doe Street 84, 35932 JohnVille"
],
"phoneNumbers": [
"747-058-484"
,
"027-478-036"
]
}
}
编辑2:
我发现this document表示分析仪映射有变化。我更新了我的映射但仍然得到了相同的结果。我还为问题创建了gist。