当我使用突出显示时,Elasticsearch返回损坏的HTML标记

时间:2016-08-23 14:26:58

标签: html elasticsearch tags highlight querydsl

我在内容中有一个HTML字符串,如:

"content": "<h3><a href=\"http://blog.local/page/%D8%A2%D8%B2%D8%A7%D8%AF\">The Matrix has you </a></h3>follow the white rabbit."

我使用"fragment_size" : 150来控制突出显示的片段在字符中的大小,但它返回一个带有损坏的HTML标记的子字符串:

           "highlight": {
                "content": [
                    "&#x2F;%D8%A2%D8%B2%D8%A7%D8%AF&quot;&gt;The <em>Matrix</em> has"
                ]
            }

如何在基于JSON的查询DSL中修复它?

{
    "query": {
        "filtered": { 
            "query": {
                "multi_match": {
                    "query": "matrix",
                    "fields": ["title","content"]
                    }
            },
            "filter": {
                "term": { "content_type": "page" }
            }
        }
    },
    "highlight" : {
            "order" : "score",
        "fields" : {
            "content" : {"fragment_size" : 150, "number_of_fragments" : 3}
        }
    }
}

这是一个示例回复:

{
    "took": 8,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 0.98773545,
        "hits": [
            {
                "_index": "myindex",
                "_type": "post",
                "_id": "101",
                "_score": 0.024953224,
                "_source": {
                    "ID": 101,
                    "content_type": "page",
                    "date": "1999-02-18 14:32:21",
                    "title": "Wake up, Neo",
                    "content": "<h3><a href=\"http://blog.local/page/%D8%A2%D8%B2%D8%A7%D8%AF\">The Matrix has you </a></h3>follow the white rabbit."
                },
                "highlight": {
                    "content": [
                        "&#x2F;%D8%A2%D8%B2%D8%A7%D8%AF&quot;&gt;the <em>matrix</em> has"
                    ]
                }
            }
        ]
    }
}

1 个答案:

答案 0 :(得分:0)

我没有尝试过,但我认为你应该在高亮部分指定encoder html

{
    "query": {
        "filtered": { 
            "query": {
                "multi_match": {
                    "query": "matrix",
                    "fields": ["title","content"]
                    }
            },
            "filter": {
                "term": { "content_type": "page" }
            }
        }
    },
    "highlight" : {
        "order" : "score",
        "fields" : {
            "content" : {"fragment_size" : 150, "number_of_fragments" : 3}
        },
        "encoder": "html"
    }
}

请参阅:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html