目前我正在使用以下python脚本:
import json
from collections import defaultdict
from pprint import pprint
with open('prettyPrint.txt') as data_file:
data = json.load(data_file)
locations = defaultdict(list)
for item in data['data']:
location = item['relationships']['location']['data']['id']
locations[location].append(item['id'])
pprint(locations)
解析一些脏的json数据,如下所示:
{
"links": {
"self": "http://localhost:2510/api/v2/jobs?skills=data%20science"
},
"data": [
{
"id": 121,
"type": "job",
"attributes": {
"title": "Data Scientist",
"date": "2014-01-22T15:25:00.000Z",
"description": "Data scientists are in increasingly high demand amongst tech companies in London. Generally a combination of business acumen and technical skills are sought. Big data experience ..."
},
"relationships": {
"location": {
"links": {
"self": "http://localhost:2510/api/v2/jobs/121/location"
},
"data": {
"type": "location",
"id": 3
}
},
"country": {
"links": {
"self": "http://localhost:2510/api/v2/jobs/121/country"
},
"data": {
"type": "country",
"id": 1
}
},
此时输出是这样的:
85: [36026,
36028,
36032,
36027,
217897,
286398,
315064,
320879,
322303,
322608,
322611,
323199,
325659,
327652],
88: [13690,
13693,
13689,
13692,
13691,
16454,
16453,
28002,
28003,
28004,
28001,
114667,
233319,
233329,
263814,
271490,
271571,
271569,
271570,
291274,
291275,
300376,
300373,
301293,
301295,
304286,
304285,
320425,
320426,
320424,
320431,
320430,
321284,
321281,
321283,
321282,
321280,
324345,
327926,
347985,
358537,
358549,
357807,
364541,
358431,
334990,
359241],
但是我想改变它,以便输出看起来像这样:
...
87: 02
88: 73
89: 15
90: 104
...
我知道我需要在某个地方放置某种i=0
,i++
- 但我无法弄明白 - 如何做到这一点?
答案 0 :(得分:1)
您只需要dict中的项目计数,而不是locations
dict中实际项目的计数。将int
与defaultdict
一起使用为:
locations = defaultdict(int)
# makes default value of each key as `0`
并将for
循环设为:
for item in data['data']:
location = item['relationships']['location']['data']['id']
locations[location] += 1 # increase the count by `1`
或者,使用collections.Counter()
和生成器表达式更好,如@ TigerhawkT3所述:
from collections import Counter
Counter(item['relationships']['location']['data']['id'] for item in data['data'])