我是Elastic Search的新手。我将字段映射到'字符串'在弹性搜索索引中。如果字段值包含给定的搜索文本,我需要检索文档。
JSON1 : "{\"id\":\"1\",\"message\":\"Welcome to elastic search\"}"
JSON2 : "{\"id\":\"2\",\"message\":\"elasticsearch\"}"
如果我使用' elastic'进行搜索,我需要同时获取这两个记录。我只得到第一个。
现在我正在获取基于FTS的文档。请引导我在弹性搜索中实现psql中的/ ilike搜索。
提前致谢。
答案 0 :(得分:1)
这是令牌化的问题。你可以看看NGram http://www.elasticsearch.org/guide/reference/index-modules/analysis/ngram-tokenizer/
您可以使用路线/_analyze
以下是默认情况下Elasticsearch如何标记。
curl -XGET 'localhost:9200/_analyze?tokenizer=standard' -d 'this is a test elasticsearch'
{
"tokens": [{
"token": "this",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 1
}, {
"token": "is",
"start_offset": 5,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 2
}, {
"token": "a",
"start_offset": 8,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 3
}, {
"token": "test",
"start_offset": 10,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 4
}, {
"token": "elasticsearch",
"start_offset": 15,
"end_offset": 28,
"type": "<ALPHANUM>",
"position": 5
}
]
}
以下是nGram和默认值
的示例 curl -XGET 'localhost:9200/_analyze?tokenizer=nGram' -d 'this is a test elasticsearch'
{
"tokens": [{
"token": "t",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 1
}, {
"token": "h",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 2
}, {
"token": "i",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 3
}, {
"token": "s",
"start_offset": 3,
"end_offset": 4,
"type": "word",
"position": 4
}, {
"token": " ",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 5
}, {
"token": "i",
"start_offset": 5,
"end_offset": 6,
"type": "word",
"position": 6
}, {
"token": "s",
"start_offset": 6,
"end_offset": 7,
"type": "word",
"position": 7
}, {
"token": " ",
"start_offset": 7,
"end_offset": 8,
"type": "word",
"position": 8
}, {
"token": "a",
"start_offset": 8,
"end_offset": 9,
"type": "word",
"position": 9
}, {
"token": " ",
"start_offset": 9,
"end_offset": 10,
"type": "word",
"position": 10
}, {
"token": "t",
"start_offset": 10,
"end_offset": 11,
"type": "word",
"position": 11
}, {
"token": "e",
"start_offset": 11,
"end_offset": 12,
"type": "word",
"position": 12
}, {
"token": "s",
"start_offset": 12,
"end_offset": 13,
"type": "word",
"position": 13
}, {
"token": "t",
"start_offset": 13,
"end_offset": 14,
"type": "word",
"position": 14
}, {
"token": " ",
"start_offset": 14,
"end_offset": 15,
"type": "word",
"position": 15
}, {
"token": "e",
"start_offset": 15,
"end_offset": 16,
"type": "word",
"position": 16
}, {
"token": "l",
"start_offset": 16,
"end_offset": 17,
"type": "word",
"position": 17
}, {
"token": "a",
"start_offset": 17,
"end_offset": 18,
"type": "word",
"position": 18
}, {
"token": "s",
"start_offset": 18,
"end_offset": 19,
"type": "word",
"position": 19
}, {
"token": "t",
"start_offset": 19,
"end_offset": 20,
"type": "word",
"position": 20
}, {
"token": "i",
"start_offset": 20,
"end_offset": 21,
"type": "word",
"position": 21
}, {
"token": "c",
"start_offset": 21,
"end_offset": 22,
"type": "word",
"position": 22
}, {
"token": "s",
"start_offset": 22,
"end_offset": 23,
"type": "word",
"position": 23
}, {
"token": "e",
"start_offset": 23,
"end_offset": 24,
"type": "word",
"position": 24
}, {
"token": "a",
"start_offset": 24,
"end_offset": 25,
"type": "word",
"position": 25
}, {
"token": "r",
"start_offset": 25,
"end_offset": 26,
"type": "word",
"position": 26
}, {
"token": "c",
"start_offset": 26,
"end_offset": 27,
"type": "word",
"position": 27
}, {
"token": "h",
"start_offset": 27,
"end_offset": 28,
"type": "word",
"position": 28
}, {
"token": "th",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 29
}, {
"token": "hi",
"start_offset": 1,
"end_offset": 3,
"type": "word",
"position": 30
}, {
"token": "is",
"start_offset": 2,
"end_offset": 4,
"type": "word",
"position": 31
}, {
"token": "s ",
"start_offset": 3,
"end_offset": 5,
"type": "word",
"position": 32
}, {
"token": " i",
"start_offset": 4,
"end_offset": 6,
"type": "word",
"position": 33
}, {
"token": "is",
"start_offset": 5,
"end_offset": 7,
"type": "word",
"position": 34
}, {
"token": "s ",
"start_offset": 6,
"end_offset": 8,
"type": "word",
"position": 35
}, {
"token": " a",
"start_offset": 7,
"end_offset": 9,
"type": "word",
"position": 36
}, {
"token": "a ",
"start_offset": 8,
"end_offset": 10,
"type": "word",
"position": 37
}, {
"token": " t",
"start_offset": 9,
"end_offset": 11,
"type": "word",
"position": 38
}, {
"token": "te",
"start_offset": 10,
"end_offset": 12,
"type": "word",
"position": 39
}, {
"token": "es",
"start_offset": 11,
"end_offset": 13,
"type": "word",
"position": 40
}, {
"token": "st",
"start_offset": 12,
"end_offset": 14,
"type": "word",
"position": 41
}, {
"token": "t ",
"start_offset": 13,
"end_offset": 15,
"type": "word",
"position": 42
}, {
"token": " e",
"start_offset": 14,
"end_offset": 16,
"type": "word",
"position": 43
}, {
"token": "el",
"start_offset": 15,
"end_offset": 17,
"type": "word",
"position": 44
}, {
"token": "la",
"start_offset": 16,
"end_offset": 18,
"type": "word",
"position": 45
}, {
"token": "as",
"start_offset": 17,
"end_offset": 19,
"type": "word",
"position": 46
}, {
"token": "st",
"start_offset": 18,
"end_offset": 20,
"type": "word",
"position": 47
}, {
"token": "ti",
"start_offset": 19,
"end_offset": 21,
"type": "word",
"position": 48
}, {
"token": "ic",
"start_offset": 20,
"end_offset": 22,
"type": "word",
"position": 49
}, {
"token": "cs",
"start_offset": 21,
"end_offset": 23,
"type": "word",
"position": 50
}, {
"token": "se",
"start_offset": 22,
"end_offset": 24,
"type": "word",
"position": 51
}, {
"token": "ea",
"start_offset": 23,
"end_offset": 25,
"type": "word",
"position": 52
}, {
"token": "ar",
"start_offset": 24,
"end_offset": 26,
"type": "word",
"position": 53
}, {
"token": "rc",
"start_offset": 25,
"end_offset": 27,
"type": "word",
"position": 54
}, {
"token": "ch",
"start_offset": 26,
"end_offset": 28,
"type": "word",
"position": 55
}
]
}
以下是一个示例链接,用于在索引中设置正确的分析器/标记器 How to setup a tokenizer in elasticsearch
然后你的查询应该返回预期的文件。