我正在尝试将JSON文件转换为CSV文件,并且CSV文件的格式不正确。它可以正确创建列,但将其放置在列标题下方的文本却荒谬地隔开了。
例如,对于timestamp列,当它仅占用一列时,它将跨越JSON文件中其行内的5列的时间戳。当我编辑文件以仅将文本写入CSV文件时,就不存在此问题(即仅使用了一列)。这是一张照片:
当脚本仅处理一个项目即可正常工作,然后在给出更多信息时出现错误时,是什么导致此问题的?
代码如下:
__author__ = 'seandolinar'
import json
import csv
import io
data_json = io.open('2018_to_2019-03-11.json', mode='r', encoding='utf-8').read()
data_python = json.loads(data_json)
csv_out = io.open('2018_to_2019-03-11.csv', mode='w', encoding='utf-8')
fields = u'timestamp, text, retweets, favorites'
csv_out.write(fields)
csv_out.write(u'\n')
for line in data_python:
row = [line.get('timestamp'),
'"' + line.get('text').replace('"','""') + '"',
line.get('retweets'),
line.get('favorites')]
row_joined = u','.join(row)
csv_out.write(row_joined)
csv_out.write(u'\n')
csv_out.close()
这是我的JSON文件的一项:
{
"id": "1104890307706060802",
"timestamp": "4:42 PM - 10 Mar 2019",
"text": "“There’s not one shred of evidence that President Trump has done anything wrong.” @GrahamLedger One America News. So true, a total Witch Hunt - All started illegally by Crooked Hillary Clinton, the DNC and others!",
"link": "https://twitter.com/realDonaldTrump/status/1104890307706060802",
"is_retweet": false,
"retweets": "19K",
"favorites": "76K",
"replies": "17K"
},
答案 0 :(得分:0)
您提供的代码采用了一条曲折的路线,这对我来说没有意义,并且没有利用它导入的csv
模块。这是一种推测性的方法,可能会引发一些作者看到的但我们看不到的错误,但是我们可以从那里尝试工作。
import csv
import json
with open('example_input.json') as infile:
input_data = json.load(infile)
# These 3 lines could be consolidated but I'm being explicit about building a
# nested list
output_headers = ['timestamp', 'text', 'retweets', 'favorites']
output_to_write = []
output_to_write.append(output_headers)
# Now iterate the JSON data and append rows as lists
for row in input_data:
output_to_write.append([row.get('timestamp'),
row.get('text'),
row.get('retweets'),
row.get('favorites')])
with open('example_output.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerows(output_to_write)