我正在研究一个大型JSON,我想将其转换为csv进行进一步分析。 当我使用json_normalize构建表时,它会收到以下错误:
追踪(最近一次呼叫最后一次):
文件“/Users/Home/Downloads/JSONtoCSV/easybill.py”,第30行,在 “status”,“text”,“text_prefix”,“title”,“type”,“use_shipping_address”,“vat_option”
文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/json/normalize.py”,第248行,json_normalize _recursive_extract(data,record_path,{},level = 0)
文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/json/normalize.py”,第235行,在_recursive_extract中 meta_val = _pull_field(obj,val [level:])
文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/json/normalize.py”,第169行,在_pull_field中 result = result [field]
TypeError:list indices必须是整数,而不是str
在第一步中,我使用更小/更少的JSON进行了许多测试以进行代码验证。现在,当我为完整的JSON组装所有内容时,我收到了此错误消息。
我该如何解决这个问题?我正在尝试使用如下所示的pandas实现规范化:http://pandas.pydata.org/pandas-docs/stable/io.html#normalization
这是我到目前为止的代码。谢谢你的帮助!
编辑:这是JSON来源:https://pastebin.com/muGBPWv8
# -*- coding: utf-8 -*-
import pandas
import json
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from pandas.io.json import json_normalize
# Paths
json_file_path = "/Users/Home/Downloads/JSONtoCSV/JSON-Files/Seite0.json"
csv_file_path = "/Users/Home/Downloads/JSONtoCSV/CSV-files/Seite0.csv"
node = "items"
# JSON file open, no pagination information
with open(json_file_path) as f:
rawjson = json.load(f)
data = rawjson[node]
# remove "number" because it causes errors in pandas.
good_data = eval(repr(data).replace("number", "numbr"))
# normalization
norm_data = json_normalize(good_data, "items", [
["address","city"], ["address","company_name"], ["address","country"], ["address","first_name"], ["address","last_name"], ["address","personal"], ["address","salutation"], ["address","street"], ["address","suffix_1"], ["address","suffix_2"], ["address","title"], ["address","zip_code"],
"amount", "amount_net", "attachment_ids", "bank_debit_form", "cancel_id", "cash_allowance", "cash_allowance_days", "cash_allowance_text", "contact_id", "contact_label", "contact_text", "created_at", "currency", "customer_id", "discount", "discount_type", "document_date", "due_date", "edited_at", "external_id", "grace_period", "id", "is_archive", "is_draft", "is_replica",
["items","booking_account"], ["items","cost_price_charge"], ["items","cost_price_charge_type"], ["items","cost_price_net"], ["items","cost_price_total"], ["items","description"], ["items","discount"], ["items","discount_type"], ["items","export_cost_1"], ["items","export_cost_2"], ["items","id"], ["items","numbr"], ["items","position"], ["items","position_id"], ["items","quantity"], ["items","quantity_str"], ["items","serial_number"], ["items","serial_number_id"], ["items","single_price_gross"], ["items","single_price_net"], ["items","total_price_gross"], ["items","total_price_net"], ["items","total_vat"], ["items","type"], ["items","unit"], ["items","vat_percent"],
"label_address", "label_address", "login_id", "numbr", "paid_amount", "paid_at", "pdf_pages", "pdf_template", "project_id", "ref_id", "replica_url",
["service_date","type"], ["service_date","date"], ["service_date","date_from"], ["service_date","date_to"], ["service_date","text"],
"status", "text", "text_prefix", "title", "type", "use_shipping_address", "vat_option"
])
# save to csv
norm_data.to_csv(csv_file_path, sep=";")
答案 0 :(得分:0)
我发现您的代码存在一些问题:
您的元数据ID存在冲突。例如,您将'id'
作为元数据(第1级项),并将'id'
作为'items'
的元素。这可以通过向json_normalize
提供第三个参数来解决,例如
json_normalize(good_data," items",[...]," meta。"
json_normalize
期望元数据存储在词典中(可能是字典,递归),但是您的项目的值为list
,例如attachment_ids
。目前似乎json_normalize
无法处理它们。
此外,似乎json_normalize
无法处理空字符,例如"label_address": {}
。
最后,您可能不需要["items","booking_account"], ["items","cost_price_charge"], ...
的第三个(元数据)参数中的行json_normalize
,因为已经检索到具有此类路径的元素作为您的数据(即到期)到json_normalize
)的第二个参数。
考虑到json_normalize
的问题,我不想将它用于您的问题,而只是写下创建表格的简单命令式代码(带有循环/列表推导)(列表清单) )从您的JSON中,然后从该表创建pandas
数据框。