检查全文字段的日期是否在范围内

时间:2019-09-08 11:18:00

标签: elasticsearch nest

您好,我是弹性世界的新手,我试图弄清楚如何查找例如“ text”(代表被索引的整个文本)的字段的日期是否在特定范围内?

示例: 在doc_1字段“文本”中,我们有“我出生于1995年5月27日”,我想检查该文档中是否包含日期,该日期介于1995年5月20日至1995年5月30日之间。 / p>

如果这不可能,那么当我为该文档建立索引时,如何将日期“ 27/05/1995”并将其存储到新字段中?当我们谈论索引包含日期的文档时,您能给我一些关于最佳方法的提示吗?

谢谢

2 个答案:

答案 0 :(得分:1)

我认为您在这里有多种选择。要搜索在您的日期范围内的文档,您必须从文本中解析日期并将它们索引为elasticsearch中的日期字段。您可以在将文档发送到elasticsearch之前在应用程序内部执行此操作,也可以查看摄取节点。提取节点使您有机会在建立索引之前进行预处理。 https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html

一旦Elasticsearch中的文档带有单独的日期字段,您就可以使用范围查询进行搜索:https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html

答案 1 :(得分:0)

我在下面使用正则表达式来匹配文本中的某些日期。我要查找的日期格式是“ yyyy-mm-dd”,您可以将span_multi内在子句改编,以查找所需的格式。您可以了解跨度here

映射

PUT testindex
{
  "mappings": {
    "properties": {
      "content":{
        "type": "text"
      }
    }
  }
}

数据:

[
      {
        "_index" : "testindex",
        "_type" : "_doc",
        "_id" : "a3PLFW0BY3127H1HVxyC",
        "_score" : 1.0,
        "_source" : {
          "content" : "I was born on 2019/09/01"
        }
      },
      {
        "_index" : "testindex",
        "_type" : "_doc",
        "_id" : "bXPLFW0BY3127H1HaBwp",
        "_score" : 1.0,
        "_source" : {
          "content" : "I was born on 2019/09/15"
        }
      },
      {
        "_index" : "testindex",
        "_type" : "_doc",
        "_id" : "w3PLFW0BY3127H1HeBzg",
        "_score" : 1.0,
        "_source" : {
          "content" : "I was born on 2019/09/20"
        }
      }
    ]

查询:

GET testindex/_search
{
  "query": {
    "span_near": {
      "clauses": [
        {
----> clauses below look for year,month,date, you can change their order for desired
---->format
          "span_multi": {
            "match": {
              "regexp": {
                "content": "(19|20)[0-9]{2}"
              }
            }
          }
        },
        {
          "span_multi": {
            "match": {
              "regexp": {
                "content": "0[1-9]|1[012]"
              }
            }
          }
        },
         {
          "span_multi": {
            "match": {
              "regexp": {
                "content": "1[5-9]|[2][0]"  --> regex for date from 15-20
              }
            }
          }
        }
      ],
      "slop": 0,
      "in_order": true
    }
  }
}

结果

 [
      {
        "_index" : "testindex",
        "_type" : "_doc",
        "_id" : "bXPLFW0BY3127H1HaBwp",
        "_score" : 3.2095504,
        "_source" : {
          "content" : "I was born on 2019/09/15"
        }
      },
      {
        "_index" : "testindex",
        "_type" : "_doc",
        "_id" : "w3PLFW0BY3127H1HeBzg",
        "_score" : 3.2095504,
        "_source" : {
          "content" : "I was born on 2019/09/20"
        }
      }
    ]