我想使用Shingle令牌来分析字符串“快速的棕色狐狸跳过懒狗”进入:
1,
2.快速
...
<磷>氮。快速的棕色狐狸跳过懒狗我需要帮助。 感谢。
答案 0 :(得分:0)
通过使用以下索引设置,我们使用木瓦标记过滤器创建自定义分析器,您将能够生成您期望的术语:
curl -XPUT localhost:9200/your_index -d '{
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1",
"analysis": {
"analyzer": {
"my_shingles": {
"tokenizer": "standard",
"filter": [
"shingles"
]
}
},
"filter": {
"shingles": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 10
}
}
}
}
},
"mappings": {
"your_type": {
"properties": {
"field": {
"type": "string",
"analyzer": "my_shingles"
}
}
}
}
}'
然后,我们可以要求_analyze
端点显示它如何标记你的句子:
curl -XGET 'localhost:9200/your_index/_analyze?analyzer=my_shingles&pretty' -d 'The quick brown fox jumps over the lazy dog'
回复将是
{
"tokens" : [ {
"token" : "The",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "The quick",
"start_offset" : 0,
"end_offset" : 9,
"type" : "shingle",
"position" : 1
}, {
"token" : "The quick brown",
"start_offset" : 0,
"end_offset" : 15,
"type" : "shingle",
"position" : 1
}, {
"token" : "The quick brown fox",
"start_offset" : 0,
"end_offset" : 19,
"type" : "shingle",
"position" : 1
}, {
"token" : "The quick brown fox jumps",
"start_offset" : 0,
"end_offset" : 25,
"type" : "shingle",
"position" : 1
}, {
"token" : "The quick brown fox jumps over",
"start_offset" : 0,
"end_offset" : 30,
"type" : "shingle",
"position" : 1
}, {
"token" : "The quick brown fox jumps over the",
"start_offset" : 0,
"end_offset" : 34,
"type" : "shingle",
"position" : 1
}, {
"token" : "The quick brown fox jumps over the lazy",
"start_offset" : 0,
"end_offset" : 39,
"type" : "shingle",
"position" : 1
}, {
"token" : "The quick brown fox jumps over the lazy dog",
"start_offset" : 0,
"end_offset" : 43,
"type" : "shingle",
"position" : 1
}, {
...
你还会注意到会产生更多的带状疱疹,但上面的那些确实符合你的期望。