Question

我有一个48MB的JSON文件（我挖掘的数据集合）。我需要将JSON文件转换为CSV，以便将其导入SQL数据库并清理它。

我已尝试过每个JSON到CSV转换器，但它们都返回了“文件超出限制”/文件太大的相同结果。有没有一种很好的方法可以在短时间内将如此庞大的JSON文件转换为CSV？

谢谢！

Answer 1

一个48mb的json文件非常小。你应该可以使用类似的东西将数据加载到内存中

import json

with open('data.json') as data_file:    
    data = json.load(data_file)

根据您写入json文件的方式，数据可能是一个包含许多词典的列表。尝试运行：

type(data)

如果类型是列表，则迭代每个元素并检查它。例如：

for row in data:
    print(type(row))
    # print(row.keys())

如果row是一个dict实例，那么检查键并在循环内，开始构建CSV的每一行应包含的内容，然后你可以使用pandas，csv模块或只打开一个文件并按行写入与你自己的逗号。

所以可能是这样的：

import json

with open('data.json') as data_file:    
    data = json.load(data_file)


with open('some_file.txt', 'w') as f:

    for row in data:
        user = row['username']
        text = row['tweet_text']
        created = row['timestamp']
        joined = ",".join([user, text, created])
        f.write(joined)

您可能仍会遇到unicode字符，数据中的逗号等问题......但这是一般指南。

将大量JSON文件转换为CSV

1 个答案: