整篇文档的{Elasticsearch ngram

时间:2016-06-23 15:51:06

标签: php search elasticsearch n-gram

我目前正在建立基于php和mysql系统的联系人搜索。应该可搜索的所有数据都会同步到elasticsearch中,文档如下所示:

{

    "_index": "persons",
    "_type": "person",
    "_id": "705",
    "_version": 1,
    "_score": 1,
    "_source": {
        "firstname": "Jaida",
        "lastname": "Walter",
        "id": 705,
        "title": "Miss",
        "nickname": "",
        "gender": "female",
        "birthday": "1992-12-29",
        "companies": [
            "Mann, Bailey and Hills"
            ,
            "West PLC"
            ,
            "Keebler-Howe"
            ,
            "Hills LLC"
            ,
            "Toy, Gusikowski and Mohr"
            ,
            "Halvorson-Fadel"
            ,
            "Ratke PLC"
            ,
            "Turcotte-Franecki"
            ,
            "Bernier-Flatley"
            ,
            "Wisozk, Bernhard and Osinski"
        ],
        "emailAddresses": [ ],
        "addresses": [
            "Doe Street 47, 07691 JaneVille"
            ,
            "Doe Street 78, 84294 JaneVille"
            ,
            "Doe Street 37, 31698 JohnVille"
            ,
            "Doe Street 54, 62462 JaneVille"
            ,
            "Doe Street 31, 37672 JohnVille"
        ],
        "phoneNumbers": [ ]
    }

}

现在我想允许用户搜索所有这些字段,并在他/她的查询中使用部分单词。我对这个主题进行了研究,发现了ngram过滤器。我尝试使用文档和一些博客文章来实现所有内容。我目前使用以下索引设置:

  $params = [
        'index' => SearchPerson::getIndex(),
        'body' => [
            'settings' => [
                "analysis" => [
                    "analyzer" => [
                        "ngram_analyzer" => [
                            "type" => "custom",
                            "tokenizer" => "whitespace",
                            "filter" => ["asciifolding", "lowercase", "ngram"]
                        ],
                        "whitespace_analyzer" => [
                            "type" => "custom",
                            "tokenizer" => "whitespace",
                            "filter" => [
                                "lowercase",
                                "asciifolding"
                            ]
                        ]
                    ],
                    "filter" => [
                        "ngram" => [
                            "type" => "ngram",
                            "min_gram" => 2,
                            "max_gram" => 20,
                            "token_chars" => [
                                "letter",
                                "digit",
                                "punctuation",
                                "symbol"
                            ]
                        ]
                    ]
                ]
            ],
            'mapping' => [
                'person' => [
                    '_all' => [
                        'analyzer' => 'ngram_analyzer',
                        'search_analyzer' => 'whitespace_analyzer'
                    ],
                    'properties' => [
                        'firstname' => [
                            'analyzer' => 'ngram_analyzer',
                            'search_analyzer' => 'whitespace_analyzer'
                        ]
                    ]
                ]
            ]
        ]
    ];

当我查询索引时,我总是得到相同的文档作为回报而不是我想要找到的文档。

修改1:

这是我的疑问:

{
  "size": 10,
  "query": {
    "match": {
      "_all": {
        "query": "Terill",
        "minimum_should_match": "100%"
      }
    }
  }
}

我还尝试查询不同的字段(例如firstname)并删除" minimum_should_match"。

我得到了以下文件:

{

    "_index": "persons",
    "_type": "person",
    "_id": "701",
    "_version": 1,
    "_score": 1,
    "_source": {
        "firstname": "Austen",
        "lastname": "Braun",
        "id": 701,
        "title": "Mr.",
        "nickname": "",
        "gender": "male",
        "birthday": "2008-05-15",
        "companies": [
            "Abshire, Fadel and Kiehn"
            ,
            "Wolf-Bogan"
            ,
            "Kohler-Langosh"
            ,
            "Howe, Skiles and Boyer"
            ,
            "Rippin, Batz and Ondricka"
            ,
            "Gislason-Kirlin"
        ],
        "emailAddresses": [ ],
        "addresses": [
            "Doe Street 68, 28012 JaneVille"
            ,
            "Doe Street 12, 78992 JohnVille"
            ,
            "Doe Street 23, 75805 JaneVille"
            ,
            "Doe Street 95, 46066 JohnVille"
            ,
            "Doe Street 72, 28754 JohnVille"
        ],
        "phoneNumbers": [ ]
    }

}

但是通过上面的搜索查询,我正在寻找这个:

{

    "_index": "persons",
    "_type": "person",
    "_id": "712",
    "_version": 1,
    "_score": 1,
    "_source": {
        "firstname": "Terrill",
        "lastname": "Parker",
        "id": 712,
        "title": "Mr.",
        "nickname": "",
        "gender": "male",
        "birthday": "1970-07-12",
        "companies": [
            "Abshire, Fadel and Kiehn"
            ,
            "Zemlak Ltd"
            ,
            "McDermott, Schuppe and Mayer"
            ,
            "Ledner, Rosenbaum and Maggio"
            ,
            "Schoen PLC"
            ,
            "Hills LLC"
            ,
            "Tromp, Abernathy and Kuvalis"
            ,
            "Toy, Gusikowski and Mohr"
            ,
            "Hyatt-Flatley"
            ,
            "Rippin, Batz and Ondricka"
            ,
            "Jenkins-Corwin"
            ,
            "Ratke PLC"
        ],
        "emailAddresses": [
            "john66@doe.com"
            ,
            "pdoe@doe.info"
        ],
        "addresses": [
            "Doe Street 31, 35283 JaneVille"
            ,
            "Doe Street 78, 46721 JohnVille"
            ,
            "Doe Street 48, 39953 JohnVille"
            ,
            "Doe Street 24, 31388 JohnVille"
            ,
            "Doe Street 84, 35932 JohnVille"
        ],
        "phoneNumbers": [
            "747-058-484"
            ,
            "027-478-036"
        ]
    }

}

编辑2:

我发现this document表示分析仪映射有变化。我更新了我的映射但仍然得到了相同的结果。我还为问题创建了gist

0 个答案:

没有答案