Question

我在csv文件中有以下数据：

helpful_reply,reply_by,thread_id
"[{""helpful_reply"":""1 person found this helpful""},{""helpful_reply"":""""},{""helpful_reply"":""1 person found this helpful""}]","[{""reply_by"":""Adam""},{""reply_by"":""John""},{""reply_by"":""Smith""}]","149617"
"[{""helpful_reply"":""1 person found this helpful""},{""helpful_reply"":""""},{""helpful_reply"":""1 person found this helpful""}]","[{""reply_by"":""John""},{""reply_by"":""Mary""},{""reply_by"":""Smith""}]","147223"

它包含3列：helpful_reply，reply_by，thread_id

列"helpful_reply"和"reply_by"包含JSON数组。

我想将此文件转换为另一个csv文件，其中包含如下表格：

| helpful_reply               | reply_by   | thread_id |
|-----------------------------|------------|-----------|
| 1 person found this helpful | Adam       | 149617    |
| NULL                        | John       | 149617    |
| 1 person found this helpful | Smith      | 149617    |
| 1 person found this helpful | John       | 147223    |
| NULL                        | Mary       | 147223    |
| 1 person found this helpful | Smith      | 147223    |

到目前为止，我已编写此代码，并且不确定我是否采用了良好的方法：

import csv
import json
with open('helpful.csv', encoding='utf-8-sig') as csvfile:
    csvreader=csv.reader(csvfile,delimiter=',',quotechar='"')
    ofile=open('output.csv', 'w')
    rownum=0
    for row in csvreader:
        if rownum==0:
            header=row
        else:
            column=0
            for col in row:
                x=col
                x=json.loads(col)
                if isinstance(x,int):
                    print(x)
                else:
                    y=header[column]
                    for x in x:
                        ofile.write(x[y]+"\n")
                column+=1
        rownum+=1
    ofile.close()

运行上面的代码逐行产生数据：

1 person found this helpful

1 person found this helpful
Adam
John
Smith
1 person found this helpful

1 person found this helpful
John
Mary
Smith

那么如何以表格（csv）格式保存数据？

Answer 1

JSON有点扭曲：回复列表然后是用户列表，所以你需要确保订单是守恒的，但是如果你有一个row，无论如何都不需要进入CSV细节：

helpful_reply_list = json.loads(row[0])
reply_by_list = json.loads(row[1])
thread_id = row[2]

# Printing it to make it simpler in my code, you put it in a file
for helpful_reply, reply_by in zip(helpful_reply_list, reply_by_list):
    print '%s\t%s\t%s\n' % (
        helpful_reply["helpful_reply"] or None,
        reply_by["reply_by"],
        thread_id["thread_id"])

为每个row执行此操作并完成

Answer 2

您的输入数据布局肯定有点令人费解，但我认为以下内容至少非常接近您想要的内容。

你没有真正指定输出csv文件的格式，所以我只是猜测并使用|个字符作为分隔符。如你的问题所示，列不排列好，但在csv文件中这应该没关系。

import csv
import json

with open('helpful.csv', 'r', encoding='utf-8-sig', newline='') as infile, \
     open('output.csv', 'w', encoding='utf-8-sig', newline='') as outfile:

    fieldnames = 'helpful_reply', 'reply_by', 'thread_id'  # output file
    csvreader = csv.reader(infile, delimiter=',', quotechar='"')
    csvwriter = csv.DictWriter(outfile, fieldnames, delimiter='|', quotechar='"')

    next(csvreader)  # skip header of input file
    csvwriter.writeheader()  # write header of output file

    # read and write rows of both files
    for row in csvreader:
        data = [json.loads(col) for col in row]
        thread_id = data[2]
        for helpful_reply, reply_by in zip(data[0], data[1]):
            row = dict(**helpful_reply, **reply_by, thread_id=thread_id)
            if not row['helpful_reply']: row['helpful_reply'] = "NULL"
            csvwriter.writerow(row)

从您的示例输入数据生成的output.csv文件的内容：

helpful_reply|reply_by|thread_id
1 person found this helpful|Adam|149617
NULL|John|149617
1 person found this helpful|Smith|149617
1 person found this helpful|John|147223
NULL|Mary|147223
1 person found this helpful|Smith|147223

从多个JSON数据到一个表

2 个答案: