JSON使用pandas进行规范化 - 列表索引需要为int

时间:2017-10-15 12:24:34

标签: python json pandas

我正在研究一个大型JSON,我想将其转换为csv进行进一步分析。 当我使用json_normalize构建表时,它会收到以下错误:

  

追踪(最近一次呼叫最后一次):

     

文件“/Users/Home/Downloads/JSONtoCSV/easybill.py”,第30行,在       “status”,“text”,“text_prefix”,“title”,“type”,“use_shipping_address”,“vat_option”

     

文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/json/normalize.py”,第248行,json_normalize       _recursive_extract(data,record_path,{},level = 0)

     

文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/json/normalize.py”,第235行,在_recursive_extract中       meta_val = _pull_field(obj,val [level:])

     

文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/json/normalize.py”,第169行,在_pull_field中       result = result [field]

     

TypeError:list indices必须是整数,而不是str

在第一步中,我使用更小/更少的JSON进行了许多测试以进行代码验证。现在,当我为完整的JSON组装所有内容时,我收到了此错误消息。

我该如何解决这个问题?我正在尝试使用如下所示的pandas实现规范化:http://pandas.pydata.org/pandas-docs/stable/io.html#normalization

这是我到目前为止的代码。谢谢你的帮助!

编辑:这是JSON来源:https://pastebin.com/muGBPWv8

# -*- coding: utf-8 -*-
import pandas
import json
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from pandas.io.json import json_normalize

# Paths
json_file_path = "/Users/Home/Downloads/JSONtoCSV/JSON-Files/Seite0.json"
csv_file_path = "/Users/Home/Downloads/JSONtoCSV/CSV-files/Seite0.csv"
node = "items"

# JSON file open, no pagination information
with open(json_file_path) as f:
    rawjson = json.load(f)
data = rawjson[node]

# remove "number" because it causes errors in pandas.
good_data = eval(repr(data).replace("number", "numbr"))

# normalization
norm_data =  json_normalize(good_data, "items", [
["address","city"], ["address","company_name"], ["address","country"], ["address","first_name"], ["address","last_name"], ["address","personal"], ["address","salutation"], ["address","street"], ["address","suffix_1"], ["address","suffix_2"], ["address","title"], ["address","zip_code"], 
"amount", "amount_net", "attachment_ids", "bank_debit_form", "cancel_id", "cash_allowance", "cash_allowance_days", "cash_allowance_text", "contact_id", "contact_label", "contact_text", "created_at", "currency", "customer_id", "discount", "discount_type", "document_date", "due_date", "edited_at", "external_id", "grace_period", "id", "is_archive", "is_draft", "is_replica",
["items","booking_account"], ["items","cost_price_charge"], ["items","cost_price_charge_type"], ["items","cost_price_net"], ["items","cost_price_total"], ["items","description"], ["items","discount"], ["items","discount_type"], ["items","export_cost_1"], ["items","export_cost_2"], ["items","id"], ["items","numbr"], ["items","position"], ["items","position_id"], ["items","quantity"], ["items","quantity_str"], ["items","serial_number"], ["items","serial_number_id"], ["items","single_price_gross"], ["items","single_price_net"], ["items","total_price_gross"], ["items","total_price_net"], ["items","total_vat"], ["items","type"], ["items","unit"], ["items","vat_percent"],
"label_address", "label_address", "login_id", "numbr", "paid_amount", "paid_at", "pdf_pages", "pdf_template", "project_id", "ref_id", "replica_url",
["service_date","type"], ["service_date","date"], ["service_date","date_from"], ["service_date","date_to"], ["service_date","text"], 
"status", "text", "text_prefix", "title", "type", "use_shipping_address", "vat_option"
])

# save to csv
norm_data.to_csv(csv_file_path, sep=";")

1 个答案:

答案 0 :(得分:0)

我发现您的代码存在一些问题:

  1. 您的元数据ID存在冲突。例如,您将'id'作为元数据(第1级项),并将'id'作为'items'的元素。这可以通过向json_normalize提供第三个参数来解决,例如

    json_normalize(good_data," items",[...]," meta。"

  2. json_normalize期望元数据存储在词典中(可能是字典,递归),但是您的项目的值为list,例如attachment_ids。目前似乎json_normalize无法处理它们。

  3. 此外,似乎json_normalize无法处理空字符,例如"label_address": {}

  4. 最后,您可能不需要["items","booking_account"], ["items","cost_price_charge"], ...的第三个(元数据)参数中的行json_normalize,因为已经检索到具有此类路径的元素作为您的数据(即到期)到json_normalize)的第二个参数。

  5. 考虑到json_normalize的问题,我不想将它用于您的问题,而只是写下创建表格的简单命令式代码(带有循环/列表推导)(列表清单) )从您的JSON中,然后从该表创建pandas数据框。