为elasticsearch构建自定义标记器

时间:2017-09-20 13:10:11

标签: elasticsearch elasticsearch-plugin elasticsearch2

我正在构建一个自定义标记生成器来响应:Performance of doc_values field vs analysed field

这个API似乎都没有记录(?),因此我将从其他插件/标记器中删除代码示例,但是当我重新启动弹性部署了我的标记生成器后,我在日志中不断出现此错误:

[2017-09-20 08:45:37,412][WARN ][indices.cluster          ] [Samuel Silke] [[storm-crawler-2017-09-11][3]] marking and sending shard failed due to [failed to create index]
[storm-crawler-2017-09-11] IndexCreationException[failed to create index]; nested: CreationException[Guice creation errors:

1) Could not find a suitable constructor in com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
  at com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory.class(Unknown Source)
  at org.elasticsearch.index.analysis.TokenizerFactoryFactory.create(Unknown Source)
  at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown Source)
  at _unknown_

1 error];
    at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:294)
    at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:163)
    at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.common.inject.CreationException: Guice creation errors:

1) Could not find a suitable constructor in com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
  at com.cameraforensics.elasticsearch.plugins.UrlTokenizerFactory.class(Unknown Source)
  at org.elasticsearch.index.analysis.TokenizerFactoryFactory.create(Unknown Source)
  at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.initialize(Unknown Source)
  at _unknown_

1 error
    at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:360)
    at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:172)
    at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
    at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:157)
    at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
    at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358)
    ... 9 more

我的tokenizer是为v2.3.4构建的,TokenizerFactory是这样的:

public class UrlTokenizerFactory extends AbstractTokenizerFactory {

    @Inject
    public UrlTokenizerFactory(Index index, IndexSettingsService indexSettings, @Assisted String name, @Assisted Settings settings){
        super(index, indexSettings.getSettings(), name, settings);
    }

    @Override
    public Tokenizer create() {
        return new UrlTokenizer();
    }
}

我真的不知道我做错了什么。我是否错误地部署了它?它似乎是根据日志使用我的类...

我只将它部署到我的一个es节点(4节点集群)。 /_cat/plugins?v端点表示:

name         component          version type url 
Samuel Silke urltokenizer       2.3.4.0 j        

由于这个过程很少或根本没有文档,我通过复制其他人在插件中创建的构造来实现这一目标。

我看到的错误没有意义。我的TokenizerFactory看起来就像这个弹性版本的其他人一样。我做错了什么,或者可能做得不好,我应该做这个工作?

1 个答案:

答案 0 :(得分:0)

原来我错过了一个Environment变量。应该是这样的: public UrlTokenizerFactory(Index index, IndexSettingsService indexSettings, Environment env, @Assisted String name, @Assisted Settings settings){ ...

我最终在这里找到了类似的一个:https://github.com/codelibs/elasticsearch-analysis-kuromoji-neologd/blob/2.3.x/src/main/java/org/codelibs/elasticsearch/kuromoji/neologd/index/analysis/KuromojiTokenizerFactory.java