我尝试使用Smart Chinese Analyzer分析Elasticsearch中的文档,但是,Elasticsearch不是获取分析的中文字符,而是返回这些字符的unicodes。例如:
PUT /test_chinese
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"default": {
"type": "smartcn"
}
}
}
}
}
}
GET /test_chinese/_analyze?text='我说世界好!'
我希望得到每个汉字,但我得到:
{
"tokens": [
{
"token": "25105",
"start_offset": 3,
"end_offset": 8,
"type": "word",
"position": 4
},
{
"token": "35828",
"start_offset": 11,
"end_offset": 16,
"type": "word",
"position": 8
},
{
"token": "19990",
"start_offset": 19,
"end_offset": 24,
"type": "word",
"position": 12
},
{
"token": "30028",
"start_offset": 27,
"end_offset": 32,
"type": "word",
"position": 16
},
{
"token": "22909",
"start_offset": 35,
"end_offset": 40,
"type": "word",
"position": 20
}
]
}
你知道发生了什么吗?
谢谢!
答案 0 :(得分:0)
我发现了有关我的问题的问题。似乎Sense中存在一个错误。 在这里,您可以找到与Zachary Tong,Elasticsearch Developer的对话:https://discuss.elastic.co/t/smart-chinese-analysis-returns-unicodes-instead-of-chinese-tokens/37133 以下是发现错误的凭单:https://github.com/elastic/sense/issues/88