智能中文分析Elasticsearch返回unicodes

时间:2015-12-14 11:44:46

标签: elasticsearch

我尝试使用Smart Chinese Analyzer分析Elasticsearch中的文档,但是,Elasticsearch不是获取分析的中文字符,而是返回这些字符的unicodes。例如:

PUT /test_chinese
{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "default": {
                        "type": "smartcn"
                     }
                 }
             }
         }
     }
}

GET /test_chinese/_analyze?text='我说世界好!'

我希望得到每个汉字,但我得到:

{
    "tokens": [
      {
          "token": "25105",
          "start_offset": 3,
          "end_offset": 8,
          "type": "word",
          "position": 4
      },
      {
          "token": "35828",
          "start_offset": 11,
          "end_offset": 16,
          "type": "word",
          "position": 8
      },
      {
          "token": "19990",
          "start_offset": 19,
          "end_offset": 24,
          "type": "word",
          "position": 12
      },
      {
          "token": "30028",
          "start_offset": 27,
          "end_offset": 32,
          "type": "word",
          "position": 16
      },
      {
          "token": "22909",
          "start_offset": 35,
          "end_offset": 40,
          "type": "word",
          "position": 20
      }
   ]
}

你知道发生了什么吗?

谢谢!

1 个答案:

答案 0 :(得分:0)

我发现了有关我的问题的问题。似乎Sense中存在一个错误。 在这里,您可以找到与Zachary Tong,Elasticsearch Developer的对话:https://discuss.elastic.co/t/smart-chinese-analysis-returns-unicodes-instead-of-chinese-tokens/37133 以下是发现错误的凭单:https://github.com/elastic/sense/issues/88