显示在这里: Query hits
我搜索“嘿”,检索到的记录之一是“你好”。
另一个例子是这样的: Query hits
我再次搜索“红外线”,并显示了一条包含以下内容的记录:“This is a message at index: 1”。
这是索引的设置:
settings analysis: {
filter: {
edge_ngram_filter: {
type: "edge_ngram",
min_gram: "2",
max_gram: "20",
}
},
analyzer: {
edge_ngram_analyzer: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "edge_ngram_filter"]
}
}
} do
mappings dynamic: true do
indexes :content, type: :text, analyzer: "edge_ngram_analyzer"
# indexes :chat_id, type: :long
end
end
答案 0 :(得分:0)
根据您为 hey
生成的索引映射令牌将
GET /_analyze
{
"tokens": [
{
"token": "he",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "hey",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 1
}
]
}
为 hello
生成的令牌将
GET /_analyze
{
"tokens": [
{
"token": "he",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "hel",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 1
},
{
"token": "hell",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 2
},
{
"token": "hello",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 3
}
]
}
由于上面两个都有 he
标记,所以如果你搜索 hey
,两个文档都会匹配
修改你的索引映射为
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 3, // note this
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
},
"max_ngram_diff": 10
},
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
现在使用analyze API
GET /_analyze
{
"analyzer" : "my_analyzer",
"text" : "hey"
}
令牌将
{
"tokens": [
{
"token": "hey",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
}
]
}
索引数据:
{
"content": "hey"
}
{
"content": "hello"
}
搜索查询:
{
"query":{
"match":{
"content":"hey"
}
}
}
搜索结果:
"hits": [
{
"_index": "66754045",
"_type": "_doc",
"_id": "2",
"_score": 0.8713851,
"_source": {
"content": "hey"
}
}
]