使用Python从API中提取JSON数据

时间:2016-12-01 15:25:39

标签: python json exception-handling

正在考虑的数据来自API,这意味着它非常不一致 - 有时会拉出意想不到的内容,有时它什么都不会,等等。

我感兴趣的是每条记录与ISO 3166-2相关的数据。

数据(当它没有遇到错误时)通常看起来像这样:

{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "RO", "adminCode1": "10", "countryName": "Romania", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "10"}, {"type": "ISO3166-2", "code": "B"}], "adminName1": "Bucure\u015fti"}
{"countryCode": "DE", "adminCode1": "07", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "07"}, {"type": "ISO3166-2", "code": "NW"}], "adminName1": "North Rhine-Westphalia"}
{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}
{"countryCode": "DE", "adminCode1": "02", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "02"}, {"type": "ISO3166-2", "code": "BY"}], "adminName1": "Bavaria"}

我们以一条记录为例:

{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}

由此我有兴趣提取ISO 3166-2表示,即DE-BW

我一直在尝试用python提取这些信息的不同方法,一次尝试看起来像这样:

coord = response.get('codes', {}).get('type', {}).get('ISO3166-2', None)

另一次尝试看起来像这样:

print(json.dumps(response["codes"]["ISO3166-2"]))

然而,这些方法都没有奏效。

我如何拍摄如下记录:

{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}

并使用python仅提取DE-BW,同时控制看起来不完全相同的实例,例如还从以下位置提取GB-ENG

{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}

当然不会崩溃,如果它得到的东西看起来不像那些,即异常处理。

完整文件

import json
import requests
from collections import defaultdict
from pprint import pprint

# open up the output of 'data-processing.py'
with open('job-numbers-by-location.txt') as data_file:

    for line in data_file:
        identifier, name, coords, number_of_jobs = line.split("|")
        coords = coords[1:-1]
        lat, lng = coords.split(",")
        # print("lat: " + lat, "lng: " + lng)
        response = requests.get("http://api.geonames.org/countrySubdivisionJSON?lat="+lat+"&lng="+lng+"&username=s.matthew.english").json()


        codes = response.get('codes', [])
        for code in codes:
            if code.get('type') == 'ISO3166-2':
                print('{}-{}'.format(response.get('countryCode', 'UNKNOWN'), code.get('code', 'UNKNOWN'))

1 个答案:

答案 0 :(得分:1)

' ISO3166-2'是字典值,而不是键

codes = response.get('codes', [])
for code in codes:
    if code.get('type') == 'ISO3166-2':
        print('{}-{}'.format(response.get('countryCode', 'UNKNOWN'), code.get('code', 'UNKNOWN')))