自定义分析器,为elasticsearch创建大型char_filter列表

时间:2016-10-21 11:15:02

标签: elasticsearch elasticsearch-plugin query-analyzer

我尝试将自定义分析器添加到弹性搜索中。我有一个太大的“映射”同义词列表(mapper_list)。 mapper_list的大小约为30.000个元素。

Message.aggregate(
    [
        { "$match": { "to": user } },
        { "$sort": { "date": 1 } },
        { "$group": { 
            "_id": "from",
            "to": { "$first": "$to" },
            "message": { "$first": "$message" },
            "date": { "$first": "$date" },
            "origId": { "$first": "$_id" }
        }},
        { "$lookup": {
             "from": "users",
             "localField": "from",
             "foreignField": "_id",
             "as": "from"
        }},
        { "$lookup": {
             "from": "users",
             "localField": "to",
             "foreignField": "_id",
             "as": "to"
        }},
        { "$unwind": { "path" : "$from" } },
        { "$unwind": { "path" : "$to" } }
    ],
    function(err,results) {
        if (err) throw err;
        return results;
    }
)

来自elasetic搜索的错误信息

requests.post(es_host + '/_close')

settings = {
    "settings" : {
        "analysis" : {
            "char_filter" : {
                "my_mapping" : {
                    "type" : "mapping",
                    "mappings" : mapper_list
                }
            },
            "analyzer" : {
                "my_analyzer" : {
                    "tokenizer" : "standard",
                    "char_filter" : ["my_mapping"]
                }
            }
        }
    }
}

requests.put(es_host + '/_settings',
             data=json.dumps(settings))

requests.post(es_host + '/_open')

请关于解决此问题的方法的任何评论。

有关ES版本的信息:

[test-index] IndexCreationException[failed to create index]; nested: ArrayIndexOutOfBoundsException[256];
    at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:313)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:174)
    at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:0)

我认为错误的原因是由于大句子的映射。你想要映射到底是什么?如果查看source code并且您违反了此限制,则最多可包含256个字符。我得到了同样的例外

  

ArrayIndexOutOfBoundsException异常[256]

如果我尝试映射大字符串。

{
  "settings": {
    "analysis": {
      "char_filter": {
        "my_mapping": {
          "type": "mapping",
          "mappings": ["More than 256 characters. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. => exception will be thrown"]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_mapping"
          ]
        }
      }
    }
  }
}

我不知道你的用例,但是你需要减少你要映射的字符串的长度,那么它应该可以工作。