从json2cvs格式转换时出错

时间:2013-03-30 13:39:54

标签: python json csv

有人发给我这段代码,以便从json转换为csv格式。

这是json2csv的代码。

import sys, json, csv

input = open(sys.argv[1])
json_array = json.load(input)
input.close()

item_data = json_array
if len(item_data) >= 1:
    first_item_id = item_data[0]['item_id']
    columns = item_data[0].keys()

csv_file = open(sys.argv[2], "wb")
writer = csv.writer(csv_file)
# there is currently a known bug where column names are partially uppercase, this    will  be fixed soon. the "map(lambda x: x.lower(), columns)" fixes this issue in the mean time
writer.writerow(map(lambda x: x.lower(), columns)) 

# here .items() is a standard python function
for item in item_data:
    row = []
    for column_name in columns:
        if column_name.lower() == 'name_part': # lower required due to above issue
            row.append(" ".join(item[column_name]))
        else:
            row.append(item[column_name])
    writer.writerow(row)

这是我的json数据。我保存为transaction.json

{"comment": "Developer test ", "invoice_intern_external_ids": "", "invoice_payments": [{"payment_id": 8, "payment_method": "Refund", "timestamp": "2013-03-05", "invoice_id": 12, "writeoff_reason": "", "payment": 160.0}, {"payment_id": 9, "payment_method": "Cash", "timestamp": "2013-03-05", "invoice_id": 12, "writeoff_reason": "", "payment": 160.0}], "tax": 0.0, "pay_to_external_id": -1, "total": 0.0, "pay_to_contact_id": 13, "client_external_id": 11, "is_draft": false, "invoice_clinician_external_id": 999925, "location": "Therapy A", "invoice_clinician_id": 7, "bill_to_external_id": 11, "timestamp": "2013-03-05", "client_contact_id": 16, "subtotal": 0.0, "invoice_id": 26, "write_off": 0.0, "invoice_items": [{"item_tax": 0.0, "item_name": "InitialVisit_O", "timestamp": "2013-03-05", "item_unit_price": 160.0, "tax": 0.0, "invoice_item_id": 21, "invoice_instance_id": 26, "total": 0.0, "subtotal": 0.0, "item_description": "Initial Assessment/hour", "quantity": 0.0}], "billing_date": "2013-03-05", "invoice_intern_ids": "[]", "bill_to_contact_id": 16, "balance": 0.0, "invoice_instance_id": 12}
{"comment": "", "invoice_intern_external_ids": null, "invoice_payments": [], "tax": 0.0, "pay_to_external_id": -1, "total": 260.0, "pay_to_contact_id": 13, "client_external_id": -1, "is_draft": false, "invoice_clinician_external_id": null, "location": "Sports Medicine", "invoice_clinician_id": 7, "bill_to_external_id": -1, "timestamp": "2013-02-25", "client_contact_id": 15, "subtotal": 260.0, "invoice_id": 23, "write_off": 0.0, "invoice_items": [{"item_tax": 0.0, "item_name": "CompAsses", "timestamp": "2013-02-25", "item_unit_price": 260.0, "tax": 0.0, "invoice_item_id": 36, "invoice_instance_id": 23, "total": 260.0, "subtotal": 260.0, "item_description": "Comp Assess Report", "quantity": 1.0}], "billing_date": "2013-02-22", "invoice_intern_ids": "[]", "bill_to_contact_id": 15, "balance": 260.0, "invoice_instance_id": 10}

我试过c:\python.exe c:\json2csv.py c:\transaction.json c:\transaction.txt 我收到了错误

Extra data line2 column 1 - line 12 column1 (char 1105 - char 11267)

如果有人可以更正代码来获取所有字段,那就太棒了。 我甚至不需要csv中的所有字段。我只需要client_external_idinvoice_clinician_idinvoice_idlocation,我tem_nameitem_unit_priceitem_description,{{1} },quantity

这已经有很长一段时间了。我今天需要完成这项工作。请帮助。

1 个答案:

答案 0 :(得分:1)

这里有很多问题:

  1. 您的JSON数据实际上是多个JSON数据。如果你有大量数据,这将很难修复,尽管Martijns建议每行读取可能会有所帮助,假设数据确实是每行一个JSON映射。否则,数据需要修复,如下所示:

    [{"comment": "Developer test ", "invoice_intern_external_ids": "" ...},
     {"comment": "", "invoice_intern_external_ids": null, ...}]
    

    请注意左右括号和每个JSON {}之后的逗号(除了最后一个)。

  2. 您获得的脚本不是特别通用。它假定给定的第一个JSON对象中存在“item_id”,但没有。但这是可以修复的。

  3. 您的invoice_payments数据是字典列表。这意味着您的数据是分层的。你想如何转换为CVS,这只是一个扁平的数据列表?这并不明显。你展示的脚本没有处理它,它是通用的,并假设你的json数据是平的。

  4. 固定转换器:

    import sys, json, csv
    
    input = open(sys.argv[1])
    json_array = []
    for data in input.readlines():
         json_array.append(json.loads(data))
    input.close()
    
    item_data = json_array
    if len(item_data) >= 1:
        columns = item_data[0].keys()
    
    csv_file = open(sys.argv[2], "wb")
    writer = csv.writer(csv_file)
    # there is currently a known bug where column names are partially uppercase, this    will  be fixed soon. the "map(lambda x: x.lower(), columns)" fixes this issue in the mean time
    writer.writerow(map(lambda x: x.lower(), columns)) 
    
    # here .items() is a standard python function
    for item in item_data:
        row = []
        for column_name in columns:
            if column_name.lower() == 'name_part': # lower required due to above issue
                row.append(" ".join(item[column_name]))
            else:
                row.append(item[column_name])
        writer.writerow(row)
    

    将此结果显示为CSV:

    comment,invoice_intern_external_ids,invoice_payments,tax,pay_to_external_id,total,pay_to_contact_id,client_external_id,is_draft,invoice_clinician_external_id,location,invoice_instance_id,invoice_clinician_id,bill_to_external_id,timestamp,client_contact_id,subtotal,invoice_id,write_off,invoice_items,invoice_intern_ids,bill_to_contact_id,balance,billing_date
    Developer test ,,"[{u'payment_id': 8, u'payment_method': u'Refund', u'invoice_id': 12, u'timestamp': u'2013-03-05', u'writeoff_reason': u'', u'payment': 160.0}, {u'payment_id': 9, u'payment_method': u'Cash', u'invoice_id': 12, u'timestamp': u'2013-03-05', u'writeoff_reason': u'', u'payment': 160.0}]",0.0,-1,0.0,13,11,False,999925,Therapy A,12,7,11,2013-03-05,16,0.0,26,0.0,"[{u'item_tax': 0.0, u'item_name': u'InitialVisit_O', u'timestamp': u'2013-03-05', u'item_unit_price': 160.0, u'tax': 0.0, u'subtotal': 0.0, u'invoice_item_id': 21, u'total': 0.0, u'invoice_instance_id': 26, u'item_description': u'Initial Assessment/hour', u'quantity': 0.0}]",[],16,0.0,2013-03-05
    ,,[],0.0,-1,260.0,13,-1,False,,Sports Medicine,10,7,-1,2013-02-25,15,260.0,23,0.0,"[{u'item_tax': 0.0, u'item_name': u'CompAsses', u'timestamp': u'2013-02-25', u'item_unit_price': 260.0, u'tax': 0.0, u'subtotal': 260.0, u'invoice_item_id': 36, u'total': 260.0, u'invoice_instance_id': 23, u'item_description': u'Comp Assess Report', u'quantity': 1.0}]",[],15,260.0,2013-02-22
    

    请注意您的invoice_payments数据如何转换为字符串:

    "[{u'payment_id': 8, u'payment_method': u'Refund', u'invoice_id': 12, u'timestamp': u'2013-03-05', u'writeoff_reason': u'', u'payment': 160.0}, {u'payment_id': 9, u'payment_method': u'Cash', u'invoice_id': 12, u'timestamp': u'2013-03-05', u'writeoff_reason': u'', u'payment': 160.0}]",0.0,-1,0.0,13,11,False,999925,Therapy A,12,7,11,2013-03-05,16,0.0,26,0.0,"[{u'item_tax': 0.0, u'item_name': u'InitialVisit_O', u'timestamp': u'2013-03-05', u'item_unit_price': 160.0, u'tax': 0.0, u'subtotal': 0.0, u'invoice_item_id': 21, u'total': 0.0, u'invoice_instance_id': 26, u'item_description': u'Initial Assessment/hour', u'quantity': 0.0}]"
    

    任何导入CSV的内容都不会对此有任何实际意义。您的JSON数据无法轻易转换为CSV,您必须决定并指定CSV数据的外观。