当前数据-
"hits": {
"total": 2,
"max_score": 38.91894,
"hits": [
{
"_index": "evg_dev",
"_type": "component",
"_id": "907784",
"_score": 38.91894,
"_source": {
"component_type": "para",
"qual_data_desc": "test_text_136",
"last_changed_by": "testuserevg",
"document_used": "",
"element_detail": "<para><para>Tit fot tat tit</para></para><para/>",
"datetime_created": "2018-10-16T12:31:33.932Z",
"datetime_last_changed": "2018-10-16T13:13:15.372Z",
"created_by": "testuserevg"
}
},
{
"_index": "evg_dev",
"_type": "component",
"_id": "907783",
"_score": 37.329224,
"_source": {
"component_type": "para",
"qual_data_desc": "test_evg_213",
"last_changed_by": "testuserevg",
"document_used": "",
"element_detail": "<para><para>tit fot tat</para></para><para/>",
"datetime_created": "2018-10-15T14:39:15.696Z",
"datetime_last_changed": "2018-10-15T14:42:34.145Z",
"created_by": "testuserevg"
}
}
]
}
此处正在为此映射-
"term_vector_analyzer": {
"type" : "custom",
"tokenizer": "standard",
"filter": ["asciifolding", "lowercase", "word_delimiter",
"kstem", "english_stopwords"],
"char_filter": ["html_strip"]
}
"element_detail": {
"type": "text",
"fields": {
"kstem_words": {
"type": "text",
"analyzer": "term_vector_analyzer"
}
}
},
当我们尝试使用更多类似此查询的结果来获取结果时,我们不会获得正确的结果。
这是我更喜欢的查询-
{
"query":{
"more_like_this": {
"fields": ["element_detail"],
"analyzer":"html_analyzer_without_tags",
"like":"Tit fot tat tata",
"min_term_freq":"1",
"min_doc_freq":"1",
"minimum_should_match":"10%"
}
}
}
并且html_analyser_without_tag是-
"html_analyzer_without_tags": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["asciifolding", "lowercase", "word_delimiter",
"kstem", "stemmed_appasense_stopwords_filter"],
"char_filter": ["no_escape_tag_char_filter"]
}
我们还添加了诸如min_doc_freq之类的术语,但所有这些术语都不起作用,这与映射设置有关,这就是为什么我们会收到此类错误的原因?
我们还尝试查看分析器是否得出正确的值,但它返回正确的值,将它们分词化以返回每个单词,甚至将“ minimum_should_match”降低到1%对我们也不起作用。