我正在处理2017年以来的英国大选数据。 650个选区,每个选区的所有当事方都有结果。下面的示例。我想编写一个执行以下操作的查询: 1)选择保守党占多数的所有席位 2)计算最高和第二最高投票数之间的差 3)然后提供一个大小为10的数据集,其中保守党占多数。
这是一个座位的示例对象:
{
"_shards": {
"failed": 0,
"skipped": 0,
"successful": 1,
"total": 1
},
"hits": {
"hits": [
{
"_id": "hSvqMG4BaIAfLxq_1XtL",
"_index": "election",
"_score": 6.5991144,
"_source": {
"code": "E14000620",
"constituency": "Carlisle",
"first_name": "Peter Carlyle",
"last_name": "THORNTON",
"pano": "130",
"party": "Liberal Democrats",
"rank": "0",
"votes": "1256"
},
"_type": "_doc"
},
{
"_id": "hivqMG4BaIAfLxq_1XtL",
"_index": "election",
"_score": 6.5991144,
"_source": {
"code": "E14000620",
"constituency": "Carlisle",
"first_name": "Fiona Rachel",
"last_name": "MILLS",
"pano": "130",
"party": "UKIP",
"rank": "0",
"votes": "1455"
},
"_type": "_doc"
},
{
"_id": "hyvqMG4BaIAfLxq_1XtL",
"_index": "election",
"_score": 6.5991144,
"_source": {
"code": "E14000620",
"constituency": "Carlisle",
"first_name": "Ruth Elizabeth",
"last_name": "ALCROFT",
"pano": "130",
"party": "Labour",
"rank": "0",
"votes": "18873"
},
"_type": "_doc"
},
{
"_id": "iCvqMG4BaIAfLxq_1XtL",
"_index": "election",
"_score": 6.5991144,
"_source": {
"code": "E14000620",
"constituency": "Carlisle",
"first_name": "Andrew John",
"last_name": "STEVESON",
"pano": "130",
"party": "Conservative",
"rank": "0",
"votes": "21472"
},
"_type": "_doc"
}
],
"max_score": 6.5991144,
"total": {
"relation": "eq",
"value": 4
}
},
"timed_out": false,
"took": 5
}
这是迄今为止我所管理的最好的查询:
def filter():
district = 'Carlisle'
res = es.search(index="election", body={
"size": 10,
"query": { "match": { "constituency": district } },
"sort" : [ {"votes" : {"order" : "desc"}} ],
"aggs": {
"group_by_party": {
"terms": {
"field": "Party Identifer",
"field": "votes.keyword"
}
}
}
})
p1 = res['hits']['hits'][0]["_source"]["party"]
p2 = res['hits']['hits'][1]["_source"]["party"]
p1_vote = int(res['hits']['hits'][0]["_source"]["votes"])
p2_vote = int(res['hits']['hits'][1]["_source"]["votes"])
majority = (p1_vote - p2_vote)
return p2 + ' ' + str(p2_vote) + ' ' + p1 + ' ' + str(p1_vote) + ' majority: ' + str(majority)
这将返回:
Labour 18873 Conservative 21472 majority: 2599
但是,它并没有完全按照我想要的去做。我想运用这种逻辑,返回保守党占多数的10个选区席位。我不知道如何进行。我不知道是否应该结合使用Python和ES查询来实现它,还是通过自己完成所有工作的ES查询来实现ti(优选)。
感谢您的帮助!