我基本上是试图禁用小写过滤器,以便能够对文本字段进行区分大小写的匹配。在索引和分析器文档之后,我创建了以下不带小写过滤器的映射:
输入 / my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": [
"html_strip"
],
"filter": [
"asciifolding"
]
}
}
}
}
}
启用字段数据,以便以后可以检查标记化
PUT my_index / _mapping / _doc
{
"properties": {
"my_field": {
"type": "text",
"fielddata": true
}
}
}
我测试了自定义分析器,以确保它不像预期的那样小写
POST / my_index / analyze
{
"analyzer": "my_custom_analyzer",
"text": "Is this <b>déjà Vu</b>?"
}
得到以下响应
{
"tokens": [
{
"token": "Is",
"start_offset": 0,
"end_offset": 2,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "this",
"start_offset": 3,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "deja",
"start_offset": 11,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "Vu",
"start_offset": 16,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 3
}
]
}
太好了,事情并没有像我想要的那样变成小写。所以现在我尝试插入相同的文本,看看会发生什么。
POST / my_index / _doc
{
"my_field": "Is this <b>déjà Vu</b>?"
}
并尝试对其进行查询
POST / my_index / _search
{
"query": {
"regexp": {
"my_field": "Is.*"
}
},
"docvalue_fields": [
"my_field"
]
}
,没有任何点击。现在,如果我尝试使用小写的正则表达式,我会得到
POST / my_index / _search
{
"query": {
"regexp": {
"my_field": "is.*"
}
},
"docvalue_fields": [
"my_field"
]
}
返回
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "6d6PP20BXDCQSINU0RC_",
"_score": 1,
"_source": {
"my_field": "Is this <b>déjà Vu</b>?"
},
"fields": {
"my_field": [
"b",
"déjà",
"is",
"this",
"vu"
]
}
}
]
}
}
在我看来,由于只有小写的正则表达式匹配,并且文档值全部以小写形式返回,因此某些地方小写仍在变小写。我在这里做什么错了?
答案 0 :(得分:1)
到目前为止良好的开端!!!
唯一的问题是您没有将自定义分析器应用于字段。将您的映射更改为此,它将使您更进一步。
PUT my_index/_mapping/_doc
{
"properties": {
"my_field": {
"type": "text",
"fielddata": true,
"analyzer": "my_custom_analyzer" <-- add this
}
}
}