使用Jest使用自定义分析器创建索引的麻烦

时间:2014-11-15 18:25:37

标签: java elasticsearch jest

Jest为弹性搜索提供了出色的异步API,我们发现它非常有用。但是,有时事实证明,结果请求与我们预期的略有不同。

通常我们并不关心,因为一切都运转正常,但在这种情况下却没有。

我想用自定义ngram分析器创建索引。当我按照elasticsearch rest API文档执行此操作时,我在下面调用:

curl -XPUT 'localhost:9200/test' --data '
{
  "settings": {
    "number_of_shards": 3,
    "analysis": {
      "filter": {
        "keyword_search": {
          "type":     "edge_ngram",
          "min_gram": 3,
          "max_gram": 15
        }
      },
      "analyzer": {
        "keyword": {
          "type":      "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "keyword_search"
          ]
        }
      }
    }
  }
}'

然后我确认使用以下方法正确配置了分析仪:

curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens

作为回应,我会收到多个令牌,例如 exp expe expec 等等。

现在使用Jest客户端我将配置json放到我的类路径上的文件中,内容与上面PUT请求的主体完全相同。我执行像这样构造的Jest动作:

new CreateIndex.Builder(name)
            .settings(
                    ImmutableSettings.builder()
                            .loadFromClasspath(
                                    "settings.json"
                            ).build().getAsMap()
            ).build();

结果

  • Primo - 使用tcpdump检查实际发布到elasticsearch的内容是(漂亮打印):

    {
      "settings.analysis.filter.keyword_search.max_gram": "15",
      "settings.analysis.filter.keyword_search.min_gram": "3",
      "settings.analysis.analyzer.keyword.tokenizer": "whitespace",
      "settings.analysis.filter.keyword_search.type": "edge_ngram",
      "settings.number_of_shards": "3",
      "settings.analysis.analyzer.keyword.filter.0": "lowercase",
      "settings.analysis.analyzer.keyword.filter.1": "keyword_search",
      "settings.analysis.analyzer.keyword.type": "custom"
    }
    
  • Secundo - 生成的索引设置为:

    {
      "test": {
        "settings": {
          "index": {
            "settings": {
              "analysis": {
                "filter": {
                  "keyword_search": {
                    "type": "edge_ngram",
                    "min_gram": "3",
                    "max_gram": "15"
                  }
                },
                "analyzer": {
                  "keyword": {
                    "filter": [
                      "lowercase",
                      "keyword_search"
                    ],
                    "type": "custom",
                    "tokenizer": "whitespace"
                  }
                }
              },
              "number_of_shards": "3"   <-- the only difference from the one created with rest call
            },
            "number_of_shards": "3",
            "number_of_replicas": "0",
            "version": {"created": "1030499"},
            "uuid": "Glqf6FMuTWG5EH2jarVRWA"
          }
        }
      }
    }
    
  • Tertio - 使用curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens检查分析器我只得到一个令牌!

问题1。 Jest不发布我的原始设置json的原因是什么,但有些处理过了?

问题2. 为什么Jest生成的设置不起作用?

1 个答案:

答案 0 :(得分:8)

很高兴您发现Jest很有用,请参阅下面的答案。

  

问题1。 Jest不发布原文的原因是什么   设置json,但有些处理了一个?

这不是Jest,而是Elasticsearch的{​​{1}}这样做,请参阅:

ImmutableSettings

输出:

    Map test = ImmutableSettings.builder()
            .loadFromSource("{\n" +
                    "  \"settings\": {\n" +
                    "    \"number_of_shards\": 3,\n" +
                    "    \"analysis\": {\n" +
                    "      \"filter\": {\n" +
                    "        \"keyword_search\": {\n" +
                    "          \"type\":     \"edge_ngram\",\n" +
                    "          \"min_gram\": 3,\n" +
                    "          \"max_gram\": 15\n" +
                    "        }\n" +
                    "      },\n" +
                    "      \"analyzer\": {\n" +
                    "        \"keyword\": {\n" +
                    "          \"type\":      \"custom\",\n" +
                    "          \"tokenizer\": \"whitespace\",\n" +
                    "          \"filter\": [\n" +
                    "            \"lowercase\",\n" +
                    "            \"keyword_search\"\n" +
                    "          ]\n" +
                    "        }\n" +
                    "      }\n" +
                    "    }\n" +
                    "  }\n" +
                    "}").build().getAsMap();
    System.out.println("test = " + test);
  

问题2. 为什么Jest生成的设置不起作用?

因为您使用设置JSON / map不是预期的情况。我已经创建了这个测试来重现你的情况(它有点长但是跟我一起):

test = {
    settings.analysis.filter.keyword_search.type=edge_ngram,
    settings.number_of_shards=3,
    settings.analysis.analyzer.keyword.filter.0=lowercase,
    settings.analysis.analyzer.keyword.filter.1=keyword_search,
    settings.analysis.analyzer.keyword.type=custom,
    settings.analysis.analyzer.keyword.tokenizer=whitespace,
    settings.analysis.filter.keyword_search.max_gram=15,
    settings.analysis.filter.keyword_search.min_gram=3
}

当你运行它时,你会发现 @Test public void createIndexTemp() throws IOException { String index = "so_q_26949195"; String settingsAsString = "{\n" + " \"settings\": {\n" + " \"number_of_shards\": 3,\n" + " \"analysis\": {\n" + " \"filter\": {\n" + " \"keyword_search\": {\n" + " \"type\": \"edge_ngram\",\n" + " \"min_gram\": 3,\n" + " \"max_gram\": 15\n" + " }\n" + " },\n" + " \"analyzer\": {\n" + " \"keyword\": {\n" + " \"type\": \"custom\",\n" + " \"tokenizer\": \"whitespace\",\n" + " \"filter\": [\n" + " \"lowercase\",\n" + " \"keyword_search\"\n" + " ]\n" + " }\n" + " }\n" + " }\n" + " }\n" + "}"; Map settingsAsMap = ImmutableSettings.builder() .loadFromSource(settingsAsString).build().getAsMap(); CreateIndex createIndex = new CreateIndex.Builder(index) .settings(settingsAsString) .build(); JestResult result = client.execute(createIndex); assertTrue(result.getErrorMessage(), result.isSucceeded()); GetSettings getSettings = new GetSettings.Builder().addIndex(index).build(); result = client.execute(getSettings); assertTrue(result.getErrorMessage(), result.isSucceeded()); System.out.println("SETTINGS SENT AS STRING settingsResponse = " + result.getJsonString()); Analyze analyze = new Analyze.Builder() .index(index) .analyzer("keyword") .source("Expecting many tokens") .build(); result = client.execute(analyze); assertTrue(result.getErrorMessage(), result.isSucceeded()); Integer actualTokens = result.getJsonObject().getAsJsonArray("tokens").size(); assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1); analyze = new Analyze.Builder() .analyzer("keyword") .source("Expecting single token") .build(); result = client.execute(analyze); assertTrue(result.getErrorMessage(), result.isSucceeded()); actualTokens = result.getJsonObject().getAsJsonArray("tokens").size(); assertTrue("Expected single token but got " + actualTokens, actualTokens == 1); admin().indices().delete(new DeleteIndexRequest(index)).actionGet(); createIndex = new CreateIndex.Builder(index) .settings(settingsAsMap) .build(); result = client.execute(createIndex); assertTrue(result.getErrorMessage(), result.isSucceeded()); getSettings = new GetSettings.Builder().addIndex(index).build(); result = client.execute(getSettings); assertTrue(result.getErrorMessage(), result.isSucceeded()); System.out.println("SETTINGS AS MAP settingsResponse = " + result.getJsonString()); analyze = new Analyze.Builder() .index(index) .analyzer("keyword") .source("Expecting many tokens") .build(); result = client.execute(analyze); assertTrue(result.getErrorMessage(), result.isSucceeded()); actualTokens = result.getJsonObject().getAsJsonArray("tokens").size(); assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1); } 使用实际设置的情况完全错误(settingsAsMap包含另一个settings,这是你的JSON,但它们应该是合并)因此分析失败。

为什么这不是预期用途?

仅仅因为这就是Elasticsearch在这种情况下的表现。如果设置数据展平(默认情况下由settings类完成),则它不应具有顶级元素ImmutableSettings,但它可以具有相同的顶部如果数据没有展平,则为level元素(这就是settings的测试用例有效的原因)。

<强> TL; DR:

您的设置JSON不应包含顶级“设置”元素(如果您通过settingsAsString运行它)。