我在csv文件中有以下数据:
helpful_reply,reply_by,thread_id
"[{""helpful_reply"":""1 person found this helpful""},{""helpful_reply"":""""},{""helpful_reply"":""1 person found this helpful""}]","[{""reply_by"":""Adam""},{""reply_by"":""John""},{""reply_by"":""Smith""}]","149617"
"[{""helpful_reply"":""1 person found this helpful""},{""helpful_reply"":""""},{""helpful_reply"":""1 person found this helpful""}]","[{""reply_by"":""John""},{""reply_by"":""Mary""},{""reply_by"":""Smith""}]","147223"
它包含3列:helpful_reply,reply_by,thread_id
列"helpful_reply"
和"reply_by"
包含JSON数组。
我想将此文件转换为另一个csv文件,其中包含如下表格:
| helpful_reply | reply_by | thread_id |
|-----------------------------|------------|-----------|
| 1 person found this helpful | Adam | 149617 |
| NULL | John | 149617 |
| 1 person found this helpful | Smith | 149617 |
| 1 person found this helpful | John | 147223 |
| NULL | Mary | 147223 |
| 1 person found this helpful | Smith | 147223 |
到目前为止,我已编写此代码,并且不确定我是否采用了良好的方法:
import csv
import json
with open('helpful.csv', encoding='utf-8-sig') as csvfile:
csvreader=csv.reader(csvfile,delimiter=',',quotechar='"')
ofile=open('output.csv', 'w')
rownum=0
for row in csvreader:
if rownum==0:
header=row
else:
column=0
for col in row:
x=col
x=json.loads(col)
if isinstance(x,int):
print(x)
else:
y=header[column]
for x in x:
ofile.write(x[y]+"\n")
column+=1
rownum+=1
ofile.close()
运行上面的代码逐行产生数据:
1 person found this helpful
1 person found this helpful
Adam
John
Smith
1 person found this helpful
1 person found this helpful
John
Mary
Smith
那么如何以表格(csv)格式保存数据?
答案 0 :(得分:0)
JSON有点扭曲:回复列表然后是用户列表,所以你需要确保订单是守恒的,但是如果你有一个row
,无论如何都不需要进入CSV细节:
helpful_reply_list = json.loads(row[0])
reply_by_list = json.loads(row[1])
thread_id = row[2]
# Printing it to make it simpler in my code, you put it in a file
for helpful_reply, reply_by in zip(helpful_reply_list, reply_by_list):
print '%s\t%s\t%s\n' % (
helpful_reply["helpful_reply"] or None,
reply_by["reply_by"],
thread_id["thread_id"])
为每个row
执行此操作并完成
答案 1 :(得分:0)
您的输入数据布局肯定有点令人费解,但我认为以下内容至少非常接近您想要的内容。
你没有真正指定输出csv文件的格式,所以我只是猜测并使用|
个字符作为分隔符。如你的问题所示,列不排列好,但在csv文件中这应该没关系。
import csv
import json
with open('helpful.csv', 'r', encoding='utf-8-sig', newline='') as infile, \
open('output.csv', 'w', encoding='utf-8-sig', newline='') as outfile:
fieldnames = 'helpful_reply', 'reply_by', 'thread_id' # output file
csvreader = csv.reader(infile, delimiter=',', quotechar='"')
csvwriter = csv.DictWriter(outfile, fieldnames, delimiter='|', quotechar='"')
next(csvreader) # skip header of input file
csvwriter.writeheader() # write header of output file
# read and write rows of both files
for row in csvreader:
data = [json.loads(col) for col in row]
thread_id = data[2]
for helpful_reply, reply_by in zip(data[0], data[1]):
row = dict(**helpful_reply, **reply_by, thread_id=thread_id)
if not row['helpful_reply']: row['helpful_reply'] = "NULL"
csvwriter.writerow(row)
从您的示例输入数据生成的output.csv
文件的内容:
helpful_reply|reply_by|thread_id
1 person found this helpful|Adam|149617
NULL|John|149617
1 person found this helpful|Smith|149617
1 person found this helpful|John|147223
NULL|Mary|147223
1 person found this helpful|Smith|147223