Question

如果我从以这种方式创建的索引开始：

put newsindex
{
  "settings" : {
    "number_of_shards":3,
    "number_of_replicas":2,
    "analysis": {
 "filter": {
     "my_stop": {
         "type":      "stop",
        "stopwords":  "_english_"
     },
      "english_stemmer" : {
           "type": "stemmer",
           "language": "english"
     }
 },
    "analyzer" : {
      "english":  {
         "tokenizer" : "standard",
         "filter": [
            "my_stop",
            "english_stemmer"
       ]
     }
   }
        }
  },
  "mappings" : {
    "news": {
      "properties": {
        "newsid": {
          "type": "integer"
        },
        "newstype": {
          "type": "text"
        },
        "bodytext": {
          "type": "text"
        },
        "caption": {
          "type": "text"
        },
        "headline": {
          "type": "text"
        },
        "approved": {
          "type": "text"
        },
        "author": {
          "type": "text"
        },
        "contact": {
          "type": "text"
        },
        "datecreated": {
          "type": "date",
          "format": "date_time"
        },
        "datesubmitted": {
          "type": "date",
          "format": "date_time"
        },
        "lastmodifieddate": {
          "type": "date",
          "format": "date_time"
        }
      }
    }
  }
}

如果我使用＆＃39; 苏格兰＆＃39;执行搜索，则会返回158个文档。随意查看其中的一些，他们在我的一个搜索术语中都有这个词。

我现在希望我的输出停止包括＆＃39;停止＆＃39;我的搜索上的文字。即如上所述，如果我搜索＆＃39; 是＆＃39;，这个＆＃39;，＆＃39; 那＆＃39;我自己正确地没有返回任何记录，但如果我搜索＆＃39;苏格兰＆＃39;我获得了数千个文档，因为搜索现在包含了只有一词的文档，因此搜索似乎已经删除了停用词。因此，针对我正在执行搜索的列，我将介绍以下内容：

 "analyzer": "english",
 "search_analyzer" : "english"

例如，bodytext变为：

  "bodytext": {
      "type": "text",
      "analyzer": "english",
      "search_analyzer": "english"

但现在如果我对＆＃39; 苏格兰＆＃39;执行相同的搜索我只返回了6份文件。

暂时坚持使用苏格兰，这是建立起来的搜索。

"{ "query": { 
       "bool" : { "should" : [ 
                         { "wildcard" : { "headline" : { "value" : "*scotland*" }}}, 
                         { "wildcard": { "bodytext" : { "value" : "*scotland*" }}}]}},
        "_source" :["headline", "bodytext", "datesubmitted", "newsid"]}"

根据以下@Jpountz评论，如果我将通配符更改为 term 并从我的查询中删除*****。我按照预期返回所有文件。如果我将值更改为＆＃34;是＆＃34; ，我还会在其中找到的每个文档。现在完全忽略了停用词。

Elasticsearch Analyzers会出现意外行为

0 个答案: