突出显示ElasticSearch自动填充

时间:2016-11-11 15:22:56

标签: elasticsearch

我有以下数据要在ElasticSearch上编制索引。

enter image description here

我想实现自动填充功能,并突出显示特定文档与查询匹配的原因。

这是我的索引的设置:

{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 15
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

指数分析

  • 拆分字边界上的文字。
  • 删除pontuation。
  • 小写
  • Edge NGrams每个令牌

所以倒置指数看起来像:

enter image description here

这是我为名称字段定义映射的方式:

{
    "index_type": {
        "properties": {
            "name": {
                "type":     "string",
                "index_analyzer":  "autocomplete", 
                "search_analyzer": "standard" 
            }
        }
    }
}

当我查询时:

GET http://localhost:9200/index/type/_search

{
    "query": {
        "match": {
            "name": "soft"
        }
    },
    "highlight": {
        "fields" : {
            "name" : {}
        }
    }
}

搜索: soft

应用标准标记符,“软”是用于查找倒排索引的术语。此搜索匹配文档:1,3,4,5,6,7这是正确的,但突出显示的部分我希望是“软”而不是整个单词:

{
  "hits": [
    {
      "_source": {
        "name": "SoftwareRocks everytime"
      },
      "highlight": {
        "name": [
          "<em>SoftwareRocks</em> everytime"
        ]
      }
    },
    {
      "_source": {
        "name": "Software AG"
      },
      "highlight": {
        "name": [
          "<em>Software</em> AG"
        ]
      }
    },
    {
      "_source": {
        "name": "Software AG2"
      },
      "highlight": {
        "name": [
          "<em>Software</em> AG2"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG good software better"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> AG good <em>software</em> better"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> AG"
        ]
      }
    },
    {
      "_source": {
        "name": "is soft ware ok"
      },
      "highlight": {
        "name": [
          "is <em>soft</em> ware ok"
        ]
      }
    }
  ]
}

搜索:软件ag

应用标准标记器,将“软件ag”转换为“软件”和“ag”,以找到倒排索引。这个搜索匹配文档:1,3,4,5,6,这是正确的,但突出显示的部分我希望是“软件”和“ag”,而不是围绕“软件”和“ag”的整个词:

{
  "hits": [
    {
      "_source": {
        "name": "Software AG"
      },
      "highlight": {
        "name": [
          "<em>Software</em> <em>AG</em>"
        ]
      }
    },
    {
      "_source": {
        "name": "Software AG2"
      },
      "highlight": {
        "name": [
          "<em>Software</em> <em>AG2</em>"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> <em>AG</em>"
        ]
      }
    },
    {
      "_source": {
        "name": "Op Software AG good software better"
      },
      "highlight": {
        "name": [
          "Op <em>Software</em> <em>AG</em> good <em>software</em> better"
        ]
      }
    },
    {
      "_source": {
        "name": "SoftwareRocks everytime"
      },
      "highlight": {
        "name": [
          "<em>SoftwareRocks</em> everytime"
        ]
      }
    }
  ]
}

我阅读了有关elasticsearch的高亮文档,但我无法理解突出显示是如何执行的。对于上面的两个例子,我希望只有突出显示倒排索引上的匹配标记,而不是整个单词。 任何人都可以帮助如何仅突出显示传递的值吗?

更新

所以,似乎在ElasticSearch网站上,服务器端的自动完成与我的实现类似。但是,它们似乎突出显示了客户端上匹配的查询。 如果他们这样做,我开始认为在ElasticSearch方面没有合适的解决方案,所以我在服务器端实现了突出显示功能,而不是在客户端(就像他们似乎那样)。

我在服务器端的实现(使用PHP)是:

public function search($term)
{
    $params = [
        'index' => $this->getIndexName(),
        'type' => $this->getIndexType(),
        'body' => [
            'query' => [
                'match' => [
                    'name' => $term
                ]
            ]
        ]
    ];

    $results = $this->client->search($params);

    $hits = $results['hits']['hits'];

    $data = [];

    $wrapBefore = '<strong>';
    $wrapAfter = '</strong>';

    foreach ($hits as $hit) {
        $data[] = [
            $hit['_source']['id'],
            $hit['_source']['name'],
            preg_replace("/($term)/i", "$wrapBefore$1$wrapAfter", strip_tags($hit['_source']['name']))
        ];
    }

    return $data;
}

输出我对此问题的目标:

enter image description here

我添加了一笔赏金,看看ElasticSearch级别是否有解决方案来实现我上面描述的内容。

1 个答案:

答案 0 :(得分:1)

截至目前使用最新版本的弹性版本,这是不可能的,因为高亮度文档不会引用任何设置或查询。我在xhr请求选项卡下的浏览器控制台中检查了弹性自动完成示例,并找到了&#34; att&#34;关键字的自动完成响应如下。

url - https://search.elastic.co/suggest?q=att
    {
        "current_page": 1,
        "last_page": 4,
        "total_hits": 49,
        "hits": [
            {
                "tags": [],
                "url": "/elasticon/tour/2016/jp/not-attending",
                "section": "Elasticon",
                "title": "Not <em>Attending</em> - JP"
            },
            {
                "section": "Elasticon",
                "title": "<em>Attending</em> from Training - JP",
                "tags": [],
                "url": "/elasticon/tour/2016/jp/attending-training"
            },
            {
                "tags": [],
                "url": "/elasticon/tour/2016/jp/attending-keynote",
                "title": "<em>Attending</em> from Keynote - JP",
                "section": "Elasticon"
            },
            {
                "tags": [],
                "url": "/elasticon/tour/2016/not-attending",
                "section": "Elasticon",
                "title": "Thank You - Not <em>Attending</em>"
            },
            {
                "tags": [],
                "url": "/elasticon/tour/2016/attending",
                "section": "Elasticon",
                "title": "Thank You - <em>Attending</em>"
            },
            {
                "section": "Blog",
                "title": "What It's Like to <em>Attend</em> Elastic Training",
                "tags": [],
                "url": "/blog/what-its-like-to-attend-elastic-training"
            },
            {
                "tags": "Elasticsearch",
                "url": "/guide/en/elasticsearch/plugins/5.0/mapper-attachments-highlighting.html",
                "section": "Docs/",
                "title": "Highlighting <em>attachments</em>"
            },
            {
                "title": "<em>attachments</em> » email",
                "section": "Docs/",
                "tags": "Logstash",
                "url": "/guide/en/logstash/5.0/plugins-outputs-email.html#plugins-outputs-email-attachments"
            },
            {
                "section": "Docs/",
                "title": "Configuring Email <em>Attachments</em> » Actions",
                "tags": "Watcher",
                "url": "/guide/en/watcher/2.4/actions.html#configuring-email-attachments"
            },
            {
                "url": "/guide/en/watcher/2.4/actions.html#hipchat-action-attributes",
                "tags": "Watcher",
                "title": "HipChat Action <em>Attributes</em> » Actions",
                "section": "Docs/"
            },
            {
                "title": "Slack Action <em>Attributes</em> » Actions",
                "section": "Docs/",
                "tags": "Watcher",
                "url": "/guide/en/watcher/2.4/actions.html#slack-action-attributes"
            }
        ],
        "aggs": {
            "sections": [
                {
                    "Elasticon": 5
                },
                {
                    "Blog": 1
                },
                {
                    "Docs/": 43
                }
            ],
            "top_tags": [
                {
                    "XPack": 14
                },
                {
                    "Elasticsearch": 12
                },
                {
                    "Watcher": 9
                },
                {
                    "Logstash": 4
                },
                {
                    "Clients": 3
                },
                {
                    "Shield": 1
                }
            ]
        }
    }

但是在前端他们正在展示&#34; att&#34;仅在autosuggest结果中突出显示。因此,他们正在处理浏览器层上的突出显示内容。