我想将bibtex条目的month
字段索引到elasticsearch中,并使其可以通过range
查询进行搜索。这要求基础字段类型是某种数字数据类型。就我而言,short
就足够了。
bibtex month
字段的规范形式需要三个字符的缩写,因此我尝试像这样使用char_filter
:
...
"char_filter": {
"month_char_filter": {
"type": "mapping",
"mappings": [
"jan => 1",
"feb => 2",
"mar => 3",
...
"nov => 11",
"dec => 12"
]
}
...
"normalizer": {
"month_normalizer": {
"type": "custom",
"char_filter": [ "month_char_filter" ],
},
并建立这样的映射:
...
"month": {
"type": "short",
"normalizer": "month_normalizer"
},
...
但是它似乎不起作用,因为type
字段不支持像这样的规范化器,也不支持分析器。
那么char_filter
部分中所示的实现这种映射的方法将是什么,从而有范围查询的可能性呢?
答案 0 :(得分:3)
您的方法在直觉上是有意义的,但是,规范化器只能应用于keyword
字段,而分析器只能应用于text
字段。
另一种方法是在索引编制时利用ingest processors并使用script
processor进行映射。
下面您可以找到这种script
处理器的仿真,该处理器将根据monthNum
字段中存在的月份创建一个名为month
的新字段。
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"source": """
def mapping = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec'];
ctx.monthNum = mapping.indexOf(ctx.month) + 1;
"""
}
}
]
},
"docs": [
{
"_source": {
"month": "feb"
}
},
{
"_source": {
"month": "mar"
}
},
{
"_source": {
"month": "jul"
}
},
{
"_source": {
"month": "aug"
}
},
{
"_source": {
"month": "nov"
}
},
{
"_source": {
"month": "dec"
}
},
{
"_source": {
"month": "xyz"
}
}
]
}
生成的文件
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 2,
"month" : "feb"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 3,
"month" : "mar"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 7,
"month" : "jul"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 8,
"month" : "aug"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 11,
"month" : "nov"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 12,
"month" : "dec"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 0,
"month" : "xyz"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
}
]
}