Jest为弹性搜索提供了出色的异步API,我们发现它非常有用。但是,有时事实证明,结果请求与我们预期的略有不同。
通常我们并不关心,因为一切都运转正常,但在这种情况下却没有。
我想用自定义ngram分析器创建索引。当我按照elasticsearch rest API文档执行此操作时,我在下面调用:
curl -XPUT 'localhost:9200/test' --data '
{
"settings": {
"number_of_shards": 3,
"analysis": {
"filter": {
"keyword_search": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 15
}
},
"analyzer": {
"keyword": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"keyword_search"
]
}
}
}
}
}'
然后我确认使用以下方法正确配置了分析仪:
curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens
作为回应,我会收到多个令牌,例如 exp , expe , expec 等等。
现在使用Jest客户端我将配置json放到我的类路径上的文件中,内容与上面PUT请求的主体完全相同。我执行像这样构造的Jest动作:
new CreateIndex.Builder(name)
.settings(
ImmutableSettings.builder()
.loadFromClasspath(
"settings.json"
).build().getAsMap()
).build();
结果
Primo - 使用tcpdump检查实际发布到elasticsearch的内容是(漂亮打印):
{
"settings.analysis.filter.keyword_search.max_gram": "15",
"settings.analysis.filter.keyword_search.min_gram": "3",
"settings.analysis.analyzer.keyword.tokenizer": "whitespace",
"settings.analysis.filter.keyword_search.type": "edge_ngram",
"settings.number_of_shards": "3",
"settings.analysis.analyzer.keyword.filter.0": "lowercase",
"settings.analysis.analyzer.keyword.filter.1": "keyword_search",
"settings.analysis.analyzer.keyword.type": "custom"
}
Secundo - 生成的索引设置为:
{
"test": {
"settings": {
"index": {
"settings": {
"analysis": {
"filter": {
"keyword_search": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "15"
}
},
"analyzer": {
"keyword": {
"filter": [
"lowercase",
"keyword_search"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
},
"number_of_shards": "3" <-- the only difference from the one created with rest call
},
"number_of_shards": "3",
"number_of_replicas": "0",
"version": {"created": "1030499"},
"uuid": "Glqf6FMuTWG5EH2jarVRWA"
}
}
}
}
Tertio - 使用curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens
检查分析器我只得到一个令牌!
问题1。 Jest不发布我的原始设置json的原因是什么,但有些处理过了?
问题2. 为什么Jest生成的设置不起作用?
答案 0 :(得分:8)
很高兴您发现Jest很有用,请参阅下面的答案。
问题1。 Jest不发布原文的原因是什么 设置json,但有些处理了一个?
这不是Jest,而是Elasticsearch的{{1}}这样做,请参阅:
ImmutableSettings
输出:
Map test = ImmutableSettings.builder()
.loadFromSource("{\n" +
" \"settings\": {\n" +
" \"number_of_shards\": 3,\n" +
" \"analysis\": {\n" +
" \"filter\": {\n" +
" \"keyword_search\": {\n" +
" \"type\": \"edge_ngram\",\n" +
" \"min_gram\": 3,\n" +
" \"max_gram\": 15\n" +
" }\n" +
" },\n" +
" \"analyzer\": {\n" +
" \"keyword\": {\n" +
" \"type\": \"custom\",\n" +
" \"tokenizer\": \"whitespace\",\n" +
" \"filter\": [\n" +
" \"lowercase\",\n" +
" \"keyword_search\"\n" +
" ]\n" +
" }\n" +
" }\n" +
" }\n" +
" }\n" +
"}").build().getAsMap();
System.out.println("test = " + test);
问题2. 为什么Jest生成的设置不起作用?
因为您使用设置JSON / map不是预期的情况。我已经创建了这个测试来重现你的情况(它有点长但是跟我一起):
test = {
settings.analysis.filter.keyword_search.type=edge_ngram,
settings.number_of_shards=3,
settings.analysis.analyzer.keyword.filter.0=lowercase,
settings.analysis.analyzer.keyword.filter.1=keyword_search,
settings.analysis.analyzer.keyword.type=custom,
settings.analysis.analyzer.keyword.tokenizer=whitespace,
settings.analysis.filter.keyword_search.max_gram=15,
settings.analysis.filter.keyword_search.min_gram=3
}
当你运行它时,你会发现 @Test
public void createIndexTemp() throws IOException {
String index = "so_q_26949195";
String settingsAsString = "{\n" +
" \"settings\": {\n" +
" \"number_of_shards\": 3,\n" +
" \"analysis\": {\n" +
" \"filter\": {\n" +
" \"keyword_search\": {\n" +
" \"type\": \"edge_ngram\",\n" +
" \"min_gram\": 3,\n" +
" \"max_gram\": 15\n" +
" }\n" +
" },\n" +
" \"analyzer\": {\n" +
" \"keyword\": {\n" +
" \"type\": \"custom\",\n" +
" \"tokenizer\": \"whitespace\",\n" +
" \"filter\": [\n" +
" \"lowercase\",\n" +
" \"keyword_search\"\n" +
" ]\n" +
" }\n" +
" }\n" +
" }\n" +
" }\n" +
"}";
Map settingsAsMap = ImmutableSettings.builder()
.loadFromSource(settingsAsString).build().getAsMap();
CreateIndex createIndex = new CreateIndex.Builder(index)
.settings(settingsAsString)
.build();
JestResult result = client.execute(createIndex);
assertTrue(result.getErrorMessage(), result.isSucceeded());
GetSettings getSettings = new GetSettings.Builder().addIndex(index).build();
result = client.execute(getSettings);
assertTrue(result.getErrorMessage(), result.isSucceeded());
System.out.println("SETTINGS SENT AS STRING settingsResponse = " + result.getJsonString());
Analyze analyze = new Analyze.Builder()
.index(index)
.analyzer("keyword")
.source("Expecting many tokens")
.build();
result = client.execute(analyze);
assertTrue(result.getErrorMessage(), result.isSucceeded());
Integer actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1);
analyze = new Analyze.Builder()
.analyzer("keyword")
.source("Expecting single token")
.build();
result = client.execute(analyze);
assertTrue(result.getErrorMessage(), result.isSucceeded());
actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
assertTrue("Expected single token but got " + actualTokens, actualTokens == 1);
admin().indices().delete(new DeleteIndexRequest(index)).actionGet();
createIndex = new CreateIndex.Builder(index)
.settings(settingsAsMap)
.build();
result = client.execute(createIndex);
assertTrue(result.getErrorMessage(), result.isSucceeded());
getSettings = new GetSettings.Builder().addIndex(index).build();
result = client.execute(getSettings);
assertTrue(result.getErrorMessage(), result.isSucceeded());
System.out.println("SETTINGS AS MAP settingsResponse = " + result.getJsonString());
analyze = new Analyze.Builder()
.index(index)
.analyzer("keyword")
.source("Expecting many tokens")
.build();
result = client.execute(analyze);
assertTrue(result.getErrorMessage(), result.isSucceeded());
actualTokens = result.getJsonObject().getAsJsonArray("tokens").size();
assertTrue("Expected multiple tokens but got " + actualTokens, actualTokens > 1);
}
使用实际设置的情况完全错误(settingsAsMap
包含另一个settings
,这是你的JSON,但它们应该是合并)因此分析失败。
为什么这不是预期用途?
仅仅因为这就是Elasticsearch在这种情况下的表现。如果设置数据展平(默认情况下由settings
类完成),则它不应具有顶级元素ImmutableSettings
,但它可以具有相同的顶部如果数据没有展平,则为level元素(这就是settings
的测试用例有效的原因)。
<强> TL; DR:强>
您的设置JSON不应包含顶级“设置”元素(如果您通过settingsAsString
运行它)。