Elasticsearch索引分析器在添加后似乎什么都不做

时间:2018-05-25 02:33:29

标签: elasticsearch

ES新手,并关注使用不同分析器处理人类语言的文档(https://www.elastic.co/guide/en/elasticsearch/guide/current/languages.html)。在完成一些示例之后,似乎添加的分析器根本不会对搜索产生任何影响。例如

## init some index for testing
PUT /testindex
{
  "settings": {
    "number_of_replicas": 1,
    "number_of_shards": 3,
    "analysis": {},
    "refresh_interval": "1s"
  },
  "mappings": {
    "testtype": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "english"
        }
      }
    }
  }
}

## adding some analyzers for...
POST /testindex/_close
##... simple lowercase tokenization, ...(https://www.elastic.co/guide/en/elasticsearch/guide/current/lowercase-token-filter.html#lowercase-token-filter)
PUT /testindex/_settings
{
    "analysis": {
      "analyzer": {
        "my_lowercaser": {
          "tokenizer": "standard",
          "filter":  [ "lowercase" ]
        }
      }
    }
}
## ... normalization (https://www.elastic.co/guide/en/elasticsearch/guide/current/algorithmic-stemmers.html#_using_an_algorithmic_stemmer), ...
PUT testindex/_settings
{
  "analysis": {
    "filter": {
      "english_stop": {
        "type":       "stop",
        "stopwords":  "_english_"
      },
      "light_english_stemmer": {
        "type":       "stemmer",
        "language":   "light_english" 
      },
      "english_possessive_stemmer": {
        "type":       "stemmer",
        "language":   "possessive_english"
      }
    },
    "analyzer": {
      "english": {
        "tokenizer": "standard",
        "filter": [
          "english_possessive_stemmer",
          "lowercase",
          "english_stop",
          "light_english_stemmer", 
          "asciifolding" 
        ]
      }
    }
  }
}
## ... and using a hunspell dictionary (https://www.elastic.co/guide/en/elasticsearch/guide/current/hunspell.html#hunspell)
PUT testindex/_settings
{
  "analysis": {
    "filter": {
      "en_US": {
        "type":     "hunspell",
        "language": "en_US" 
      }
    },
    "analyzer": {
      "en_US": {
        "tokenizer":  "standard",
        "filter":   [ 
          "lowercase", 
          "en_US" 
          ]
      }
    }
  }
}
POST /testindex/_open
GET testindex/_settings
## it appears as though the analyzers have been added without problem

## adding some testing data
POST /testindex/testtype
{
  "title": "Will the root word of movement be found?"
}
POST /testindex/testtype
{
  "title": "That's why I never want to hear you say, ehhh I waant it thaaat away."
}

## expecting to match against root word of movement (move)
GET /testindex/testtype/_search
{
  "query": {
    "match": {
      "title": "moving"
    }
  }
}
## which returns 0 hits, as shown below

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

## ... yet I can see that the record expected does in fact exist in the index when using...
GET /testindex/testtype/_search
{
  "query": {
    "match_all": {}
  }
}

然后想一想我需要实际" 添加"分析仪到(新)字段,我执行以下操作(仍显示负面结果)

# adding the analyzers to a new field
POST /testindex/testtype
{
  "mappings": {
      "properties": {
        "title2": {
          "type": "text",
          "analyzer": [
            "my_lowercaser",
            "english",
            "en_US"
            ]
        }
      }
  }
}
# looking at the tokens I'd expect to be able to find
GET /testindex/_analyze
{
  "analyzer": "en_US", 
  "text": "Moving between directories"
}
# moving, move, between, directory

# what I actually see
GET /testindex/_analyze
{
  "field": "title2", 
  "text": "Moving between directories"
}
# moving, between, directories

甚至尝试更简单的事情

POST /testindex/testtype
    {
      "mappings": {
          "properties": {
            "title2": {
              "type": "text",
              "analyzer": "en_US"
            }
          }
      }
    }

根本没用。

所以这看起来很混乱。我在这里错过了一些关于这些分析仪应该如何工作的内容吗?这些分析仪是否应该正常工作(基于提供的信息),我只是在这里滥用它们?如果是这样,有人可以提供一个实际工作/命中的示例查询吗?

**是否还应在此处添加其他调试信息?

1 个答案:

答案 0 :(得分:0)

title2字段有3个分析器,但根据您的输出(analyze端点),似乎只应用了my_lowercaser

最后,使用hunspell为我工作的配置是:

"settings": {
    "analysis": {
      "filter": {
        "en_US": {
          "type":     "hunspell",
          "language": "en_US" 
        }
      },
      "analyzer": {
        "en_US": {
          "tokenizer":  "standard",
          "filter":   [ "lowercase", "en_US" ]
        }
      }
    }
  }

"mappings": {
    "_doc": {
      "properties": {
        "title-en-us": {
          "type": "text",
          "analyzer": "en_US"
        }
      }
    }
  }

movement未解析为movemoving(可能是hunspell字典相关)。使用move查询仅导致moving的文档,而不是movement