某些字符被视为#等分隔符,因此它们在查询中永远不会匹配。什么应该是最接近标准的自定义分析器配置,以允许这些字符匹配?
答案 0 :(得分:2)
1)最简单的方法是将whitespace tokenizer与lowercase filter一起使用。
override func viewDidLoad() {
super.viewDidLoad()
let longPressingGesture = UILongPressGestureRecognizer(target: self, action: "addPinsOnMaps:")
longPressingGesture.minimumPressDuration = 1.2
mapView.addGestureRecognizer(longPressingGesture)
}
func addPinsOnMaps(gesturePressing: UIGestureRecognizer){
let touchPoint = gesturePressing.locationInView(self.mapView)
mapView.convertPoint(touchPoint, toCoordinateFromView: self.mapView)
let annotation = MKPointAnnotation()
annotation.title = "This Place"
annotation.subtitle = "Gonna stay here for a while"
annotation.coordinate = coordinates
mapView.addAnnotation(annotation)
}
会给你
curl -XGET 'localhost:9200/_analyze?tokenizer=whitespace&filters=lowercase&pretty' -d 'new year #celebration vegas'
2)如果您只想保留一些特殊字符,则可以使用char filter映射它们,以便在{
"tokens" : [ {
"token" : "new",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
}, {
"token" : "year",
"start_offset" : 4,
"end_offset" : 8,
"type" : "word",
"position" : 2
}, {
"token" : "#celebration",
"start_offset" : 9,
"end_offset" : 21,
"type" : "word",
"position" : 3
}, {
"token" : "vegas",
"start_offset" : 22,
"end_offset" : 27,
"type" : "word",
"position" : 4
} ]
}
发生之前将文本转换为其他内容。这更接近tokenization
。例如,您可以像这样创建索引
standard analyzer
现在PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"special_analyzer": {
"char_filter": [
"special_mapping"
],
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"char_filter": {
"special_mapping": {
"type": "mapping",
"mappings": [
"#=>hashtag\\u0020"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"tweet": {
"type": "string",
"analyzer": "special_analyzer"
}
}
}
}
}
自定义分析器将生成以下标记
curl -XPOST 'localhost:9200/my_index/_analyze?analyzer=special_analyzer&pretty' -d 'new year #celebration vegas'
所以你可以像这样搜索
{
"tokens" : [ {
"token" : "new",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "year",
"start_offset" : 4,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "hashtag",
"start_offset" : 9,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "celebration",
"start_offset" : 10,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 4
}, {
"token" : "vegas",
"start_offset" : 22,
"end_offset" : 27,
"type" : "<ALPHANUM>",
"position" : 5
} ]
}
您还可以只搜索庆祝活动,因为我使用了unicode空格GET my_index/_search
{
"query": {
"match": {
"tweet": "#celebration"
}
}
}
,否则我们将始终使用\\u0020
进行搜索
希望这会有所帮助!!