基于两个值之间的差异的elasticsearch查询

时间:2019-11-19 18:20:28

标签: python elasticsearch

我正在处理2017年以来的英国大选数据。 650个选区,每个选区的所有当事方都有结果。下面的示例。我想编写一个执行以下操作的查询: 1)选择保守党占多数的所有席位 2)计算最高和第二最高投票数之间的差 3)然后提供一个大小为10的数据集,其中保守党占多数。

这是一个座位的示例对象:

{
  "_shards": {
    "failed": 0,
    "skipped": 0,
    "successful": 1,
    "total": 1
  },
  "hits": {
    "hits": [
      {
        "_id": "hSvqMG4BaIAfLxq_1XtL",
        "_index": "election",
        "_score": 6.5991144,
        "_source": {
          "code": "E14000620",
          "constituency": "Carlisle",
          "first_name": "Peter Carlyle",
          "last_name": "THORNTON",
          "pano": "130",
          "party": "Liberal Democrats",
          "rank": "0",
          "votes": "1256"
        },
        "_type": "_doc"
      },
      {
        "_id": "hivqMG4BaIAfLxq_1XtL",
        "_index": "election",
        "_score": 6.5991144,
        "_source": {
          "code": "E14000620",
          "constituency": "Carlisle",
          "first_name": "Fiona Rachel",
          "last_name": "MILLS",
          "pano": "130",
          "party": "UKIP",
          "rank": "0",
          "votes": "1455"
        },
        "_type": "_doc"
      },
      {
        "_id": "hyvqMG4BaIAfLxq_1XtL",
        "_index": "election",
        "_score": 6.5991144,
        "_source": {
          "code": "E14000620",
          "constituency": "Carlisle",
          "first_name": "Ruth Elizabeth",
          "last_name": "ALCROFT",
          "pano": "130",
          "party": "Labour",
          "rank": "0",
          "votes": "18873"
        },
        "_type": "_doc"
      },
      {
        "_id": "iCvqMG4BaIAfLxq_1XtL",
        "_index": "election",
        "_score": 6.5991144,
        "_source": {
          "code": "E14000620",
          "constituency": "Carlisle",
          "first_name": "Andrew John",
          "last_name": "STEVESON",
          "pano": "130",
          "party": "Conservative",
          "rank": "0",
          "votes": "21472"
        },
        "_type": "_doc"
      }
    ],
    "max_score": 6.5991144,
    "total": {
      "relation": "eq",
      "value": 4
    }
  },
  "timed_out": false,
  "took": 5
}

这是迄今为止我所管理的最好的查询:

def filter():
    district = 'Carlisle'
    res = es.search(index="election", body={
    "size": 10,
    "query": { "match": { "constituency": district } },
    "sort" : [ {"votes" : {"order" : "desc"}} ],
    "aggs": {
    "group_by_party": {
      "terms": {
        "field": "Party Identifer",
        "field": "votes.keyword"
           }
         }
      }
    })
    p1 = res['hits']['hits'][0]["_source"]["party"]
    p2 = res['hits']['hits'][1]["_source"]["party"]
    p1_vote = int(res['hits']['hits'][0]["_source"]["votes"])
    p2_vote = int(res['hits']['hits'][1]["_source"]["votes"])
    majority = (p1_vote - p2_vote)
    return p2 + ' ' + str(p2_vote) + ' ' + p1 + ' ' + str(p1_vote) + ' majority: ' + str(majority)

这将返回:

Labour 18873 Conservative 21472 majority: 2599

但是,它并没有完全按照我想要的去做。我想运用这种逻辑,返回保守党占多数的10个选区席位。我不知道如何进行。我不知道是否应该结合使用Python和ES查询来实现它,还是通过自己完成所有工作的ES查询来实现ti(优选)。

感谢您的帮助!

0 个答案:

没有答案