我设法导出了一些我现在想导出到csv文件的json。但是,在我的代码处于当前状态的情况下,最终的csv每个单元格大约有一个字典。但是,我想要的是每列我感兴趣的每个键的值。每个json都有很多我实际上不感兴趣的信息-我只想要cadId,cadNomeCompleto,cadProfissao和habDes之类的键。其中一些在每个JSON的其他类别内,例如pt_ar_wsgode_objectos_DadosHabilitacoes内的habDes,cadHabilitacoes内的RegistoBiograficoList内。
我已经搜索了一些JSON文档,以查看是否有某些功能以我需要的方式将键作为输入。到目前为止,我还无法仅解析所需的键,并无法导出它们,例如使用csv文件创建统一的列。有人可以向我解释我在做错什么,并告诉我如何执行此操作吗?
import json
import csv
from csv import DictWriter
list_json = ['a705932387657456c4a535355794d45786c5a326c7a6247463064584a684c314a6c5a326c7a644739436157396e636d466d61574e7657456c4a53563971633239754c6e523464413d3d&fich=RegistoBiograficoXIII_json.txt&Inline=true',
'a705932387657456c4a4a5449775447566e61584e7359585231636d4576556d566e61584e3062304a706232647959575a705932395953556c66616e4e76626935306548513d&fich=RegistoBiograficoXII_json.txt&Inline=true',
'a705932387657456b6c4d6a424d5a5764706332786864485679595339535a57647063335276516d6c765a334a685a6d6c6a6231684a5832707a6232347564486830&fich=RegistoBiograficoXI_json.txt&Inline=true',
'a7059323876574355794d45786c5a326c7a6247463064584a684c314a6c5a326c7a644739436157396e636d466d61574e7657463971633239754c6e523464413d3d&fich=RegistoBiograficoX_json.txt&Inline=true',
'a7059323876566b6c4a535355794d45786c5a326c7a6247463064584a684c314a6c5a326c7a644739436157396e636d466d61574e76566b6c4a53563971633239754c6e523464413d3d&fich=RegistoBiograficoVIII_json.txt&Inline=true',
'a7059323876566b6c4a535355794d45786c5a326c7a6247463064584a684c314a6c5a326c7a644739436157396e636d466d61574e76566b6c4a53563971633239754c6e523464413d3d&fich=RegistoBiograficoVIII_json.txt&Inline=true',
'a7059323876566b6c4a4a5449775447566e61584e7359585231636d4576556d566e61584e3062304a706232647959575a705932395753556c66616e4e76626935306548513d&fich=RegistoBiograficoVII_json.txt&Inline=true',
'a7059323876566b6b6c4d6a424d5a5764706332786864485679595339535a57647063335276516d6c765a334a685a6d6c6a62315a4a5832707a6232347564486830&fich=RegistoBiograficoVI_json.txt&Inline=true',
'a7059323876566955794d45786c5a326c7a6247463064584a684c314a6c5a326c7a644739436157396e636d466d61574e76566c3971633239754c6e523464413d3d&fich=RegistoBiograficoV_json.txt&Inline=true',
'a70593238765356596c4d6a424d5a5764706332786864485679595339535a57647063335276516d6c765a334a685a6d6c6a62306c575832707a6232347564486830&fich=RegistoBiograficoIV_json.txt&Inline=true',
'a705932387653556c4a4a5449775447566e61584e7359585231636d4576556d566e61584e3062304a706232647959575a705932394a53556c66616e4e76626935306548513d&fich=RegistoBiograficoIII_json.txt&Inline=true',
'a705932387653556b6c4d6a424d5a5764706332786864485679595339535a57647063335276516d6c765a334a685a6d6c6a62306c4a5832707a6232347564486830&fich=RegistoBiograficoII_json.txt&Inline=true',
'a7059323876513239756333527064485670626e526c4c314a6c5a326c7a644739436157396e636d466d61574e765132397563313971633239754c6e523464413d3d&fich=RegistoBiograficoCons_json.txt&Inline=true']
result = []
for i in list_json:
url = 'http://app.parlamento.pt/webutils/docs/doc.txt?path=6148523063446f764c324679626d56304c3239775a57356b595852684c3052685a47397a51574a6c636e5276637939535a576470633352764a544977516d6c765a334c446f575{}'.format(i)
r = requests.get(url)
cont = r.json()
result.append(cont)
with open('bio.csv', 'w', newline='', encoding='utf-8-sig') as outfile:
writer = DictWriter(outfile, ('?xml', 'RegistoBiografico'))
writer.writerows(result)
答案 0 :(得分:0)
您可以遍历子项以提取数据,例如
result = []
for child in cont['RegistoBiografico']['RegistoBiograficoList']['pt_ar_wsgode_objectos_DadosRegistoBiograficoWeb']:
tmp_row = []
# iterate through keys in which we're interested
for k in ['cadId', 'cadNomeCompleto', 'cadProfissao']:
try:
tmp_row.append(child[k])
except KeyError:
print(f" missing {k} for {child['cadId']}")
# insert None for missing value so columns still match
tmp_row.append(None)
result.append(tmp_row)
运行此命令会显示一些条目没有全部数据:
missing cadProfissao for 5950
missing cadProfissao for 6063
missing cadProfissao for 6121
missing cadProfissao for 5534
missing cadProfissao for 695
missing cadProfissao for 5952
missing cadProfissao for 4104
missing cadProfissao for 4389
missing cadProfissao for 2445
>>> result[123]
['5854', 'ISABEL CRISTINA RUA PIRES', 'operadora de call cen´ter']
>>>
要添加嵌套键,可以插入tmp_row.append(child['a']['b]['c'])
,但是随后还需要重复处理缺失值。
使用jsonpointer模块,您可以指定要访问的变量的路径:
from jsonpointer import resolve_pointer as j_get
result = []
search_dict = {
'Id': '/cadId',
'NomeCompleto': '/cadNomeCompleto',
'Profissao':'/cadProfissao',
'habDes':'/cadHabilitacoes/pt_ar_wsgode_objectos_DadosHabilitacoes/habDes',
}
for child in cont['RegistoBiografico']['RegistoBiograficoList']['pt_ar_wsgode_objectos_DadosRegistoBiograficoWeb']:
tmp_row = []
# iterate through keys in which we're interested
for k in search_dict.keys():
tmp_row.append(j_get(child, search_dict[k], None))
result.append(tmp_row)
由于我为KeyError
函数提供了默认值None
,因此我删除了resolve_pointer
异常处理。现在结果包含:
>>> result[123]
['5854', 'ISABEL CRISTINA RUA PIRES', 'operadora de call cen´ter', 'Ciência Política']
如果您对不完整的行或多少行感兴趣,可以使用列表理解:
>>> len([x for x in result if None in x])
165
但是,在csv输出中更容易查看。