更像这样-弹性搜索未给出正确的结果

时间:2018-10-17 11:28:13

标签: elasticsearch elastic-stack morelikethis

当前数据-

    "hits": {
        "total": 2,
        "max_score": 38.91894,
        "hits": [
            {
                "_index": "evg_dev",
                "_type": "component",
                "_id": "907784",
                "_score": 38.91894,
                "_source": {
                    "component_type": "para",
                    "qual_data_desc": "test_text_136",
                    "last_changed_by": "testuserevg",
                    "document_used": "",
                    "element_detail": "<para><para>Tit fot&nbsp;tat&nbsp;tit</para></para><para/>",
                    "datetime_created": "2018-10-16T12:31:33.932Z",
                    "datetime_last_changed": "2018-10-16T13:13:15.372Z",
                    "created_by": "testuserevg"
                }
            },
            {
                "_index": "evg_dev",
                "_type": "component",
                "_id": "907783",
                "_score": 37.329224,
                "_source": {
                    "component_type": "para",
                    "qual_data_desc": "test_evg_213",
                    "last_changed_by": "testuserevg",
                    "document_used": "",
                    "element_detail": "<para><para>tit fot&nbsp;tat</para></para><para/>",
                    "datetime_created": "2018-10-15T14:39:15.696Z",
                    "datetime_last_changed": "2018-10-15T14:42:34.145Z",
                    "created_by": "testuserevg"
                }
            }
        ]
     }

此处正在为此映射-

"term_vector_analyzer": {
                "type" : "custom",
                "tokenizer": "standard",
                "filter": ["asciifolding", "lowercase", "word_delimiter",
                            "kstem", "english_stopwords"],
                "char_filter": ["html_strip"]
            }
"element_detail": {
                    "type": "text",
                    "fields": {
                        "kstem_words": {
                            "type": "text",
                            "analyzer": "term_vector_analyzer"
                        }
                    }
                },

当我们尝试使用更多类似此查询的结果来获取结果时,我们不会获得正确的结果。

这是我更喜欢的查询-

{
"query":{
    "more_like_this": {
        "fields": ["element_detail"],
        "analyzer":"html_analyzer_without_tags",
        "like":"Tit fot tat tata",
        "min_term_freq":"1",
        "min_doc_freq":"1",
        "minimum_should_match":"10%"
    }
    }
}

并且html_analyser_without_tag是-

"html_analyzer_without_tags": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": ["asciifolding", "lowercase", "word_delimiter",
                                "kstem", "stemmed_appasense_stopwords_filter"],
                    "char_filter": ["no_escape_tag_char_filter"]
                }

我们还添加了诸如min_doc_freq之类的术语,但所有这些术语都不起作用,这与映射设置有关,这就是为什么我们会收到此类错误的原因?

我们还尝试查看分析器是否得出正确的值,但它返回正确的值,将它们分词化以返回每个单词,甚至将“ minimum_should_match”降低到1%对我们也不起作用。

0 个答案:

没有答案