正在考虑的数据来自API,这意味着它非常不一致 - 有时会拉出意想不到的内容,有时它什么都不会,等等。
我感兴趣的是每条记录与ISO 3166-2相关的数据。
数据(当它没有遇到错误时)通常看起来像这样:
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "RO", "adminCode1": "10", "countryName": "Romania", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "10"}, {"type": "ISO3166-2", "code": "B"}], "adminName1": "Bucure\u015fti"}
{"countryCode": "DE", "adminCode1": "07", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "07"}, {"type": "ISO3166-2", "code": "NW"}], "adminName1": "North Rhine-Westphalia"}
{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}
{"countryCode": "DE", "adminCode1": "02", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "02"}, {"type": "ISO3166-2", "code": "BY"}], "adminName1": "Bavaria"}
我们以一条记录为例:
{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}
由此我有兴趣提取ISO 3166-2
表示,即DE-BW
。
我一直在尝试用python提取这些信息的不同方法,一次尝试看起来像这样:
coord = response.get('codes', {}).get('type', {}).get('ISO3166-2', None)
另一次尝试看起来像这样:
print(json.dumps(response["codes"]["ISO3166-2"]))
然而,这些方法都没有奏效。
我如何拍摄如下记录:
{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}
并使用python仅提取DE-BW
,同时控制看起来不完全相同的实例,例如还从以下位置提取GB-ENG
:
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
当然不会崩溃,如果它得到的东西看起来不像那些,即异常处理。
完整文件
import json
import requests
from collections import defaultdict
from pprint import pprint
# open up the output of 'data-processing.py'
with open('job-numbers-by-location.txt') as data_file:
for line in data_file:
identifier, name, coords, number_of_jobs = line.split("|")
coords = coords[1:-1]
lat, lng = coords.split(",")
# print("lat: " + lat, "lng: " + lng)
response = requests.get("http://api.geonames.org/countrySubdivisionJSON?lat="+lat+"&lng="+lng+"&username=s.matthew.english").json()
codes = response.get('codes', [])
for code in codes:
if code.get('type') == 'ISO3166-2':
print('{}-{}'.format(response.get('countryCode', 'UNKNOWN'), code.get('code', 'UNKNOWN'))
答案 0 :(得分:1)
' ISO3166-2'是字典值,而不是键
codes = response.get('codes', [])
for code in codes:
if code.get('type') == 'ISO3166-2':
print('{}-{}'.format(response.get('countryCode', 'UNKNOWN'), code.get('code', 'UNKNOWN')))