Elasticsearch DSL query - Get all matching results

时间:2018-03-09 19:11:02

标签: elasticsearch elasticsearch-dsl

I am trying to search an index using DSL query. I have many documents which matches the criteria of log and the range of timestamp.
I am passing dates and converting it to epoch milli seconds.
But I am specifying size parameter in DSL query.
What I see is that if I specify 5000, it extracts 5000 records in the time range. But there are more number of records in the specified time range.
How to retrieve all data matching the range of time so that I dont need to specify the size?

My DSL query is as below.

GET localhost:9200/_search    
{
    "query": {
      "bool": {
        "must": [
          {"match_phrase": {
              "log":  "SOME_VALUE"
              }
            },
             {"range": {
                "@timestamp": {
                  "gte": "'"${fromDate}"'", 
                  "lte": "'"${toDate}"'", 
                  "format": "epoch_millis"
                }
              }
            }
                ]
              }
            },    
        "size":5000
}

fromDate = 1519842600000
toDate = 1520533800000

1 个答案:

答案 0 :(得分:1)

我无法使扫描API或滚动模式正常工作,因为它也没有显示预期结果。

我终于想出了一种捕获命中数的方法,然后将其作为参数传递以提取数据。

GET localhost:9200/_count    
{
"query": {
  "bool": {
    "must": [
      {"match_phrase": {
          "log":  "SOME_VALUE"
          }
        },
         {"range": {
            "@timestamp": {
              "gte": "'"${fromDate}"'", 
              "lte": "'"${toDate}"'", 
              "format": "epoch_millis"
            }
          }
        }
            ]
          }
        }
}' > count_size.txt
size_count=`cat count_size.txt  | cut -d "," -f1 | cut -d ":" -f2`
echo "Total hits matching this criteria is ${size_count}"

从这里我得到size_count值。 如果此值小于10000,则提取值,否则减少提取的时间范围。

GET localhost:9200/_search    
{
"query": {
  "bool": {
    "must": [
      {"match_phrase": {
          "log":  "SOME_VALUE"
          }
        },
         {"range": {
            "@timestamp": {
              "gte": "'"${fromDate}"'", 
              "lte": "'"${toDate}"'", 
              "format": "epoch_millis"
            }
          }
        }
            ]
          }
        },    
    "size":'"${size_count}"'
}

如果在很长一段时间内需要大量数据,我需要使用不同的日期集来运行它,并将它们组合在一起以获得总体所需的报告。

这段完整的代码是shell脚本,因此我可以更简单地使用它。