ES新手,并关注使用不同分析器处理人类语言的文档(https://www.elastic.co/guide/en/elasticsearch/guide/current/languages.html)。在完成一些示例之后,似乎添加的分析器根本不会对搜索产生任何影响。例如
## init some index for testing
PUT /testindex
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 3,
"analysis": {},
"refresh_interval": "1s"
},
"mappings": {
"testtype": {
"properties": {
"title": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
## adding some analyzers for...
POST /testindex/_close
##... simple lowercase tokenization, ...(https://www.elastic.co/guide/en/elasticsearch/guide/current/lowercase-token-filter.html#lowercase-token-filter)
PUT /testindex/_settings
{
"analysis": {
"analyzer": {
"my_lowercaser": {
"tokenizer": "standard",
"filter": [ "lowercase" ]
}
}
}
}
## ... normalization (https://www.elastic.co/guide/en/elasticsearch/guide/current/algorithmic-stemmers.html#_using_an_algorithmic_stemmer), ...
PUT testindex/_settings
{
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"light_english_stemmer": {
"type": "stemmer",
"language": "light_english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
},
"analyzer": {
"english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"light_english_stemmer",
"asciifolding"
]
}
}
}
}
## ... and using a hunspell dictionary (https://www.elastic.co/guide/en/elasticsearch/guide/current/hunspell.html#hunspell)
PUT testindex/_settings
{
"analysis": {
"filter": {
"en_US": {
"type": "hunspell",
"language": "en_US"
}
},
"analyzer": {
"en_US": {
"tokenizer": "standard",
"filter": [
"lowercase",
"en_US"
]
}
}
}
}
POST /testindex/_open
GET testindex/_settings
## it appears as though the analyzers have been added without problem
## adding some testing data
POST /testindex/testtype
{
"title": "Will the root word of movement be found?"
}
POST /testindex/testtype
{
"title": "That's why I never want to hear you say, ehhh I waant it thaaat away."
}
## expecting to match against root word of movement (move)
GET /testindex/testtype/_search
{
"query": {
"match": {
"title": "moving"
}
}
}
## which returns 0 hits, as shown below
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
## ... yet I can see that the record expected does in fact exist in the index when using...
GET /testindex/testtype/_search
{
"query": {
"match_all": {}
}
}
然后想一想我需要实际" 添加"分析仪到(新)字段,我执行以下操作(仍显示负面结果)
# adding the analyzers to a new field
POST /testindex/testtype
{
"mappings": {
"properties": {
"title2": {
"type": "text",
"analyzer": [
"my_lowercaser",
"english",
"en_US"
]
}
}
}
}
# looking at the tokens I'd expect to be able to find
GET /testindex/_analyze
{
"analyzer": "en_US",
"text": "Moving between directories"
}
# moving, move, between, directory
# what I actually see
GET /testindex/_analyze
{
"field": "title2",
"text": "Moving between directories"
}
# moving, between, directories
甚至尝试更简单的事情
POST /testindex/testtype
{
"mappings": {
"properties": {
"title2": {
"type": "text",
"analyzer": "en_US"
}
}
}
}
根本没用。
所以这看起来很混乱。我在这里错过了一些关于这些分析仪应该如何工作的内容吗?这些分析仪是否应该正常工作(基于提供的信息),我只是在这里滥用它们?如果是这样,有人可以提供一个实际工作/命中的示例查询吗?
**是否还应在此处添加其他调试信息?
答案 0 :(得分:0)
title2
字段有3个分析器,但根据您的输出(analyze
端点),似乎只应用了my_lowercaser
。
最后,使用hunspell
为我工作的配置是:
"settings": {
"analysis": {
"filter": {
"en_US": {
"type": "hunspell",
"language": "en_US"
}
},
"analyzer": {
"en_US": {
"tokenizer": "standard",
"filter": [ "lowercase", "en_US" ]
}
}
}
}
"mappings": {
"_doc": {
"properties": {
"title-en-us": {
"type": "text",
"analyzer": "en_US"
}
}
}
}
movement
未解析为move
而moving
(可能是hunspell
字典相关)。使用move
查询仅导致moving
的文档,而不是movement
。