使用Java api从弹性搜索中获取文档时,我的弹性搜索文档中包含以下code
,并尝试使用以下模式进行搜索。
code : MS-VMA1615-0D
Input : *VMA1615-0* -- Am getting the results (MS-VMA1615-0D).
Input : MS-VMA1615-0D -- Am getting the results (MS-VMA1615-0D).
Input : *VMA1615-0 -- Am getting the results (MS-VMA1615-0D).
Input : *VMA*-0* -- Am getting the results (MS-VMA1615-0D).
但是,如果我像下面这样输入,就不会得到结果。
Input : VMA1615 -- Am not getting the results.
有人希望返回代码MS-VMA1615-0D
请找到我下面正在使用的Java代码
private final String INDEX = "products";
private final String TYPE = "doc";
SearchRequest searchRequest = new SearchRequest(INDEX);
searchRequest.types(TYPE);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code);
qsQueryBuilder.defaultField("code");
searchSourceBuilder.query(qsQueryBuilder);
searchSourceBuilder.size(50);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = null;
try {
searchResponse = SearchEngineClient.getInstance().search(searchRequest);
} catch (IOException e) {
e.getLocalizedMessage();
}
Item item = null;
SearchHit[] searchHits = searchResponse.getHits().getHits();
请找到我的地图详细信息:
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
答案 0 :(得分:1)
要执行所需的操作,可能必须更改所使用的令牌生成器。当前,您正在使用空白标记生成器,该标记生成器必须替换为 pattern 标记生成器。 因此,您的新映射应如下图所示:
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "pattern",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
因此,更改映射后,对 VMA1615 的查询将返回 MS-VMA1615-0D 。
这可以将字符串“ MS-VMA1615-0D”标记为“ MS”,“ VMA1615”和“ 0D”。因此,只要您的查询中有任何一个,它将为您提供结果。
POST _analyze
{
"tokenizer": "pattern",
"text": "MS-VMA1615-0D"
}
将返回:
{
"tokens": [
{
"token": "MS",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "VMA1615",
"start_offset": 3,
"end_offset": 10,
"type": "word",
"position": 1
},
{
"token": "0D",
"start_offset": 11,
"end_offset": 13,
"type": "word",
"position": 2
}
]
}
基于您的评论:
弹性搜索不是这样工作的。 Elasticsearch存储条款和 倒排索引数据结构中的相应文档,以及 默认情况下,全文搜索产生的术语基于 空格,即文本“嗨,我是技术专家”会拆分 如[“ Hi”,“ there”,“ I”,“ am”,“ a”,“ technocrat”)。所以这意味着 存储的术语取决于标记的方式。后 查询时建立索引让我们在上面的示例中进行查询 “技术专家”,我会得到结果,因为倒排索引有 与我的文档相关的术语。因此,在您的情况下,“ VMA”不存储为术语。
为此,请使用以下映射:
PUT products
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "my_pattern_tokenizer",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
},
"tokenizer": {
"my_pattern_tokenizer": {
"type": "pattern",
"pattern": "-|\\d"
}
}
}
},
"mappings": {
"doc": {
"properties": {
"code": {
"type": "text",
"analyzer": "custom_analyzer"
}
}
}
}
}
要检查:
POST products/_analyze
{
"tokenizer": "my_pattern_tokenizer",
"text": "MS-VMA1615-0D"
}
将产生:
{
"tokens": [
{
"token": "MS",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "VMA",
"start_offset": 3,
"end_offset": 6,
"type": "word",
"position": 1
},
{
"token": "D",
"start_offset": 12,
"end_offset": 13,
"type": "word",
"position": 2
}
]
}