按位置排序python文件的输出。按共同身份增加

时间:2016-11-29 22:48:35

标签: python json

有多个共享相同位置ID的数据实例,例如在下面的输出中有很多3 s:

    121 {'data': {'id': 3, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/121/location'}}
    122 {'data': {'id': 3, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/122/location'}}
    120 {'data': {'id': 3, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/120/location'}}
    119 {'data': {'id': 3, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/119/location'}}
    191 {'data': {'id': 3, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/191/location'}}
    190 {'data': {'id': 52, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/190/location'}}
    193 {'data': {'id': 3, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/193/location'}}
    187 {'data': {'id': 3, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/187/location'}}
    189 {'data': {'id': 52, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/189/location'}}
    186 {'data': {'id': 3, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/186/location'}}
    198 {'data': {'id': 3, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/198/location'}}
    196 {'data': {'id': 3, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/196/location'}}
    199 {'data': {'id': 3, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/199/location'}}
    201 {'data': {'id': 3, 'type': 'location'}, 'links': {'self': 'http://localhost:2510/api/v2/jobs/201/location'}}

我想按照以下方式对这些进行排序:

{'data': {'id': 3, 'type': 'location'} 15
{'data': {'id': 4, 'type': 'location'} 6
{'data': {'id': 5, 'type': 'location'} 0
{'data': {'id': 6, 'type': 'location'} 11

有没有办法调整那个python脚本来输出这样的数据?

实际上它来自这个JSON文件,看起来像这样:

    {
        "links": {
            "self": "http://localhost:2510/api/v2/jobs?skills=data%20science"
        },
        "data": [
            {
                "id": 121,
                "type": "job",
                "attributes": {
                    "title": "Data Scientist",
                    "date": "2014-01-22T15:25:00.000Z",
                    "description": "Data scientists are in increasingly high demand amongst tech companies in London. Generally a combination of business acumen and technical skills are sought. Big data experience ..."
                },
                "relationships": {
                    "location": {
                        "links": {
                            "self": "http://localhost:2510/api/v2/jobs/121/location"
                        },
                        "data": {
                            "type": "location",
                            "id": 3
                        }
                    },
                    "country": {
                        "links": {
                            "self": "http://localhost:2510/api/v2/jobs/121/country"
                        },
                        "data": {
                            "type": "country",
                            "id": 1
                        }
                    },
                    "skills": {
                        "links": { 

并使用以下python脚本解析:

import json
from pprint import pprint

with open('data.json') as data_file:
    data = json.load(data_file)


    for item in data["data"]:
        print(item['id'], item['relationships']['location'])

This is the full data file in my GitHub

2 个答案:

答案 0 :(得分:1)

将数据放入数据库(例如SQLite),然后放入“GROUP BY”。

答案 1 :(得分:1)

如果我理解正确,您有一个具有以下结构的项目列表:

...

{{'data': {'id': 3, 'type': 'location'} ... }
{{'data': {'id': 3, 'type': 'location'} ... }
{{'data': {'id': 4, 'type': 'location'} ... }

...

并且您想要计算idtype的每个唯一组合的项目数,并按排序顺序打印结果?

您可以使用常规计数字典模式:

counts = dict()
for item in data['data']:
    # here I assume the items you are looking for are locations
    # for it to be a key, it has to be immutable, so make it a tuple
    val = item['relationships']['location']['data']
    location_tuple = (val['id'], val['type'])
    if location_tuple in counts:
        counts[location_tuple] += 1
    else:
        counts[location_tuple] = 1

# print them out in order, first send to list of tuples and sort
results = counts.items()
results.sort() # will sort on first item, which will be id

# results come in like so: ((3, location), 15)
for item in results:
    print 'id:', item[0][0], 'type:', item[0][1], 'count:' item[1]

这里的基本思想是你可以使用字典来计算使用元组作为你想要计算的所有不同事物的键,然后使用项目将它作为元组列表来获取,这些元组可以被排序。元组按第一元素,第二元素等递归排序,因此在构建元组时要小心将第一个排序键放在第一个位置,依此类推,或者您必须对其进行调整你的排序电话。您可能需要调整我所拥有的内容,具体取决于您要提取和打印的内容。