这是我的索引:
[
'index' => 'proof',
'body' => [
'settings' => [
'analysis' => [
'tokenizer' => [
'ngram_tokenizer' => [
'type' => 'nGram',
'min_gram' => 1,
'max_gram' => 20,
'token_chars' => ['letter', 'digit'],
],
],
'analyzer' => [
'ngram_tokenizer_analyzer' => [
'type' => 'custom',
'tokenizer' => 'ngram_tokenizer',
'filter' => ['lowercase'],
]
]
]
],
'mappings' => [
'proof_page' => [
'properties' => [
'content' => [
'type' => 'multi_field',
'path' => 'just_name',
'fields' => [
'content' => [
'type' => 'string',
'analyzer' => 'ngram_tokenizer_analyzer',
],
'untouched' => [
'type' => 'string'
]
]
],
'proof_name' => [
'type' => 'string',
],
'project_name' => [
'type' => 'string',
],
'page_number' => [
'type' => 'integer',
'index' => 'not_analyzed',
],
'proof_id' => [
'type' => 'string',
'index' => 'not_analyzed',
],
'project_id' => [
'type' => 'string',
'index' => 'not_analyzed',
]
]
]
]
]
]
这是一个示例查询:
[
'index' => 'proof',
'type' => 'proof_page',
'body' => [
'query' => [
'filtered' => [
'query' => [
'match_phrase' => [
'content' => [
'query' => 'Lorem Ipsum is simply dum',
'slop' => 0,
],
],
],
'filter' => [
'term' => [
'proof_id' => '56ebea535f5e8841038b4569',
],
],
],
],
'_source' => false,
'fields' => [
'proof_id',
'proof_name',
'project_id',
'project_name',
'page_number',
],
'highlight' => [
'fields' => [
'content' => [
'type' => 'plain',
'fragment_size' => 100,
'number_of_fragments' => 100,
'fragmenter' => 'simple',
]
]
],
'from' => 0,
'size' => 10,
'sort' => [
'page_number' => [
'order' => 'asc',
]
]
]
]
并假设我的一个与proof_id匹配的文件:56ebea535f5e8841038b4569包含类似的内容:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s.
我期待看到的结果是返回一个片段,其中突出显示以下内容:
Lorem Ipsum is simply dum
但它没有返回任何匹配,情况也是如此:
Lorem Ipsum is simply du
Lorem Ipsum is simply dumm
但它会返回以下匹配项:
Lorem Ipsum is simply d
Lorem Ipsum is simply dummy
这对我没有意义,因为我可以看到" dummy"的每个变体。作为矢量术语(ngram足以覆盖所有变化)。
值得指出的是,这只发生在搜索字符串末尾的术语中。例如:
m Ipsum is simply d
em Ipsum is simply d
rem Ipsum is simply d
orem Ipsum is simply d
Lorem Ipsum is simply d
全部按预期突出显示。
非常感谢任何帮助:)
全部谢谢!
本