使用复杂条件更新Elasticsearch索引

时间:2019-11-02 17:55:02

标签: python csv elasticsearch

我正在处理2017年英国大选数据。我有csv文件格式和Elasticsearch索引。以下是Elasticsearch索引中针对Chichester选区的示例:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : 8.03183,
    "hits" : [
      {
        "_index" : "ge",
        "_type" : "_doc",
        "_id" : "eCtGCG4BaIAfLxq_V2By",
        "_score" : 8.03183,
        "_source" : {
          "code" : "E14000633",
          "PANO" : "145",
          "constituency" : "Chichester",
          "last_name" : "EMERSON",
          "first_name" : "Andrew",
          "party" : "Patria",
          "Party Identifer" : "Patria",
          "votes" : "84"
        }
      },
      {
        "_index" : "ge",
        "_type" : "_doc",
        "_id" : "eStGCG4BaIAfLxq_V2By",
        "_score" : 8.03183,
        "_source" : {
          "code" : "E14000633",
          "PANO" : "145",
          "constituency" : "Chichester",
          "last_name" : "MONCREIFF",
          "first_name" : "Andrew Malcolm",
          "party" : "UK Independence Party (UKIP)",
          "Party Identifer" : "UKIP",
          "votes" : "1650"
        }
      },
      {
        "_index" : "ge",
        "_type" : "_doc",
        "_id" : "eitGCG4BaIAfLxq_V2By",
        "_score" : 8.03183,
        "_source" : {
          "code" : "E14000633",
          "PANO" : "145",
          "constituency" : "Chichester",
          "last_name" : "BARRIE",
          "first_name" : "Heather Margaret",
          "party" : "Green Party",
          "Party Identifer" : "Green Party",
          "votes" : "1992"
        }
      },
      {
        "_index" : "ge",
        "_type" : "_doc",
        "_id" : "eytGCG4BaIAfLxq_V2By",
        "_score" : 8.03183,
        "_source" : {
          "code" : "E14000633",
          "PANO" : "145",
          "constituency" : "Chichester",
          "last_name" : "BROWN",
          "first_name" : "Jonathan",
          "party" : "Liberal Democrats",
          "Party Identifer" : "Liberal Democrats",
          "votes" : "6749"
        }
      },
      {
        "_index" : "ge",
        "_type" : "_doc",
        "_id" : "fCtGCG4BaIAfLxq_V2By",
        "_score" : 8.03183,
        "_source" : {
          "code" : "E14000633",
          "PANO" : "145",
          "constituency" : "Chichester",
          "last_name" : "FARWELL",
          "first_name" : "Mark Andrew",
          "party" : "Labour Party",
          "Party Identifer" : "Labour",
          "votes" : "13411"
        }
      },
      {
        "_index" : "ge",
        "_type" : "_doc",
        "_id" : "fStGCG4BaIAfLxq_V2By",
        "_score" : 8.03183,
        "_source" : {
          "code" : "E14000633",
          "PANO" : "145",
          "constituency" : "Chichester",
          "last_name" : "KEEGAN",
          "first_name" : "Gillian",
          "party" : "The Conservative Party Candidate",
          "Party Identifer" : "Conservative",
          "votes" : "36032"
        }
      }
    ]
  }
}

我想创建一个新的“列”,称为“等级”,然后选择每个不同的选区,并为相关候选人添加适当的数字。因此,在上面的示例中,保守派候选人的排名为1,工党候选人的排名为2,依此类推。

每个选区的候选人数量都不相同。

一些最终目标是: 1)计算并分组每个席位的座位数 2)选择那些选区中多数是最小的选区并对其进行排序 3)编写一个算法,该算法指示战术选民应做出的选择(当然取决于您想要的结果)。

我不知道该怎么做(除了手动更新原始电子表格)。

是否应该使用cUrl命令以编程方式直接将其完成到集群中?还是使用Python脚本处理csv文件?

请有人可以建议最好的方法并提供代码示例帮助吗?

我的第一个想法是使用总点击数对每个不同的选区对返回的对象进行排序,以遍历数据并在此基础上更新等级字段。我对此:

curl -X POST "localhost:9200/ge/_search?pretty" -H 'Content-Type: application/json' -d'
{
   "query" : {
      "term" : { "Constituency" : "Aldershot" }
   },
   "sort" : [
      {"votes.keyword" : {"order" : "desc"}}
   ]
}'

返回并清空数据集。所以我被困住了。 感谢所有的帮助。

0 个答案:

没有答案