Question

我尝试使用Smart Chinese Analyzer分析Elasticsearch中的文档，但是，Elasticsearch不是获取分析的中文字符，而是返回这些字符的unicodes。例如：

PUT /test_chinese
{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "default": {
                        "type": "smartcn"
                     }
                 }
             }
         }
     }
}

GET /test_chinese/_analyze?text='我说世界好!'

我希望得到每个汉字，但我得到：

{
    "tokens": [
      {
          "token": "25105",
          "start_offset": 3,
          "end_offset": 8,
          "type": "word",
          "position": 4
      },
      {
          "token": "35828",
          "start_offset": 11,
          "end_offset": 16,
          "type": "word",
          "position": 8
      },
      {
          "token": "19990",
          "start_offset": 19,
          "end_offset": 24,
          "type": "word",
          "position": 12
      },
      {
          "token": "30028",
          "start_offset": 27,
          "end_offset": 32,
          "type": "word",
          "position": 16
      },
      {
          "token": "22909",
          "start_offset": 35,
          "end_offset": 40,
          "type": "word",
          "position": 20
      }
   ]
}

你知道发生了什么吗？

谢谢！

Answer 1

我发现了有关我的问题的问题。似乎Sense中存在一个错误。在这里，您可以找到与Zachary Tong，Elasticsearch Developer的对话：https://discuss.elastic.co/t/smart-chinese-analysis-returns-unicodes-instead-of-chinese-tokens/37133 以下是发现错误的凭单：https://github.com/elastic/sense/issues/88

智能中文分析Elasticsearch返回unicodes

1 个答案: