Elasticsearch - “星期几”的日期时间映射

时间:2016-01-21 10:10:57

标签: elasticsearch

我在一个类中有以下属性:

public DateTime InsertedTimeStamp { get; set; }

使用ES中的以下映射

"insertedTimeStamp ":{
    "type":"date",
    "format":"yyyy-MM-ddTHH:mm:ssZ"
},

我想运行一个聚合来返回按“星期几”分组的所有数据,即“星期一”,“星期二”......等等

我知道我可以在聚合调用中使用'脚本'来执行此操作see here,但是,根据我的理解,如果有大量文档(使用脚本),使用脚本会对性能产生不小的影响(在这里进行反对,思考分析日志。)

有没有办法可以用“子属性”映射属性。即我可以做一个字符串:

"somestring":{
    "type":"string",
    "analyzer":"full_word",
    "fields":{
        "partial":{
            "search_analyzer":"full_word",
            "analyzer":"partial_word",
            "type":"string"
        },
        "partial_back":{
            "search_analyzer":"full_word",
            "analyzer":"partial_word_back",
            "type":"string"
        },
        "partial_middle":{
            "search_analyzer":"full_word",
            "analyzer":"partial_word_name",
            "type":"string"
        }
    }
},

全部使用.net代码中的单个属性。

我可以做类似的事情来存储'完整日期',然后分别存储'年','月'和'日'等(在索引时某种'脚本'),或者我需要做更多类中的属性并单独映射它们?这是Transform做的吗? (现在已经折旧,因此似乎表明我需要单独的字段......)

2 个答案:

答案 0 :(得分:5)

绝对可以使用pattern_capture token filter在索引时进行。

您首先要为每个日期部件定义一个分析器+令牌过滤器组合,并将每个分配器分配给日期字段的子字段。每个标记过滤器仅捕获它感兴趣的组。

{
  "settings": {
    "analysis": {
      "analyzer": {
        "year_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "year"
          ]
        },
        "month_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "month"
          ]
        },
        "day_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "day"
          ]
        },
        "hour_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "hour"
          ]
        },
        "minute_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "minute"
          ]
        },
        "second_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "second"
          ]
        }
      },
      "filter": {
        "year": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "(\\d{4})-\\d{2}-\\d{2}[tT]\\d{2}:\\d{2}:\\d{2}[zZ]"
          ]
        },
        "month": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\\d{4}-(\\d{2})-\\d{2}[tT]\\d{2}:\\d{2}:\\d{2}[zZ]"
          ]
        },
        "day": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\\d{4}-\\d{2}-(\\d{2})[tT]\\d{2}:\\d{2}:\\d{2}[zZ]"
          ]
        },
        "hour": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\\d{4}-\\d{2}-\\d{2}[tT](\\d{2}):\\d{2}:\\d{2}[zZ]"
          ]
        },
        "minute": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\\d{4}-\\d{2}-\\d{2}[tT]\\d{2}:(\\d{2}):\\d{2}[zZ]"
          ]
        },
        "second": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\\d{4}-\\d{2}-\\d{2}[tT]\\d{2}:\\d{2}:(\\d{2})[zZ]"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "date": {
          "type": "date",
          "format": "yyyy-MM-dd'T'HH:mm:ssZ",
          "fields": {
            "year": {
              "type": "string",
              "analyzer": "year_analyzer"
            },
            "month": {
              "type": "string",
              "analyzer": "month_analyzer"
            },
            "day": {
              "type": "string",
              "analyzer": "day_analyzer"
            },
            "hour": {
              "type": "string",
              "analyzer": "hour_analyzer"
            },
            "minute": {
              "type": "string",
              "analyzer": "minute_analyzer"
            },
            "second": {
              "type": "string",
              "analyzer": "second_analyzer"
            }
          }
        }
      }
    }
  }
}

然后,当您为2016-01-22T10:01:23Z等日期编制索引时,您将获得使用相关部分填充的每个日期子字段,即

  • date2016-01-22T10:01:23Z
  • date.year2016
  • date.month01
  • date.day22
  • date.hour10
  • date.minute01
  • date.second23

然后您可以自由地聚合这些子字段中的任何一个以获得您想要的内容。

答案 1 :(得分:2)

我认为您唯一的选择似乎是scripted upsert,这样您就可以在编制索引时运行Resources$NotFoundException

我创建了像这样的基本索引

scripts

然后你应该像这样索引你的文件

POST user_index
{
  "mappings": {
    "users": {
      "properties": {
        "timestamp": {
          "type": "date",
          "format" : "yyyy-MM-dd'T'HH:mm:ssZ"
        },
        "month":{
          "type" : "string"
        },
        "day_of_week" : {
          "type" : "string"
        },
        "name" : {
          "type" : "string"
        }
      }
    }
  }
}

它会像这样索引文档,更多关于datetime操纵

POST user_index/users/111/_update/
{
  "scripted_upsert": true,
  "script": "ctx._source.month = DateTime.parse('2014-03-01T10:30:00').toString('MMMM');ctx._source.day_of_week = DateTime.parse('2014-03-01T10:30:00').dayOfWeek().getAsText()",
  "upsert": {
    "name": "Brad Smith",
    "timestamp": "2014-03-01T10:30:00Z"
  }
}

现在,您可以轻松执行 { "_index": "user_index", "_type": "users", "_id": "111", "_score": 1, "_source": { "timestamp": "2014-03-01T10:30:00Z", "day_of_week": "Saturday", "name": "Brad Smith", "month": "March" } } 。另请注意,您必须为此启用dynamic scripting,最好将脚本放在aggregations文件夹中,并将config/scripts作为timestamp传递。您也可能希望根据您的要求将所有内容放在脚本中。

希望这会有所帮助!!