Question

我正在尝试通过CSV创建JSON文件。下面的代码创建数据但不是我想要的地方。我有一些python的经验。根据我的理解，JSON文件应该像[{}，{}，...，{}]一样编写。

我该怎么做？：

我可以插入'，'但是如何删除最后一个'，'？
如何在最开始插入'['，在最后插入']'？我尝试将它插入outputfile.write（'['... etc），它显示了太多的地方。
在json文件的第一行不包含标题。

Names.csv：

id,team_name,team_members
123,Biology,"Ali Smith, Jon Doe"
234,Math,Jane Smith 
345,Statistics ,"Matt P, Albert Shaw"
456,Chemistry,"Andrew M, Matt Shaw, Ali Smith"
678,Physics,"Joe Doe, Jane Smith, Ali Smith "

代码：

import csv
import json
import os

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    for line in infile:
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         json.dump(row,outfile)
         outfile.write("," + "\n" )

到目前为止的输出：

{"id": "id", "team_name": "team_name", "team_members": ["team_members\n"]},
{"id": "123", "team_name": "Biology", "team_members": ["\"Ali Smith", " Jon Doe\"\n"]},
{"id": "234", "team_name": "Math", "team_members": ["Jane Smith \n"]},
{"id": "345", "team_name": "Statistics ", "team_members": ["\"Matt P", " Albert Shaw\"\n"]},
{"id": "456", "team_name": "Chemistry", "team_members": ["\"Andrew M", " Matt Shaw", " Ali Smith\"\n"]},
{"id": "678", "team_name": "Physics", "team_members": ["\"Joe Doe", " Jane Smith", " Ali Smith \""]},

Answer 1

首先，你如何跳过标题？这很简单：

next(infile) # skip the first line
for line in infile:

但是，您可能需要考虑使用csv.DictReader作为输入。它处理读取标题行，并使用其中的信息为每行创建一个字典，并为您分割行（以及您可能没有想到的处理案例，如CSV中可能出现的引用或转义文本文件）：

for row in csv.DictReader(infile):
    jsondump(row,outfile)

现在解决更难的问题。

更好的解决方案可能是使用可以将迭代器转储为JSON数组的迭代JSON库。然后你可以做这样的事情：

def rows(infile):
    for line in infile:
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         yield row

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    genjson.dump(rows(infile), outfile)

stdlib json.JSONEncoder在文档中有一个例子可以完成这个 - 尽管效率不高，因为它首先使用整个迭代器来构建一个列表，然后转储：

class GenJSONEncoder(json.JSONEncoder):
    def default(self, o):
       try:
           iterable = iter(o)
       except TypeError:
           pass
       else:
           return list(iterable)
       # Let the base class default method raise the TypeError
       return json.JSONEncoder.default(self, o)

j = GenJSONEncoder()
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    outfile.write(j.encode(rows(infile)))

实际上，如果你愿意建立一个完整的列表而不是逐行编码，那么明确地进行listifying可能更简单：

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    json.dump(list(rows(infile)))

你可以通过覆盖iterencode方法来进一步说明，但这将不那么简单，你可能想在PyPI上寻找一个高效，经过良好测试的流式迭代JSON库而不是自己从json模块构建它。

但是，与此同时，这是您问题的直接解决方案，尽可能少地改变现有代码：

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    # print the opening [
    outfile.write('[\n')
    # keep track of the index, just to distinguish line 0 from the rest
    for i, line in enumerate(infile):
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         # add the ,\n _before_ each row except the first
         if i:
             outfile.write(',\n')
         json.dump(row,outfile)
    # write the final ]
    outfile.write('\n]')

这个技巧 - 处理第一个元素而不是最后一个 - 简化了这种类型的许多问题。

简化事物的另一种方法是使用itertools文档中pairwise示例的小变化来实际迭代相邻的线对：

def pairwise(iterable):
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.zip_longest(a, b, fillvalue=None)

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    # print the opening [
    outfile.write('[\n')
    # iterate pairs of lines
    for line, nextline in pairwise(infile):
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         json.dump(row,outfile)
         # add the , if there is a next line
         if nextline is not None:
             outfile.write(',')
         outfile.write('\n')
    # write the final ]
    outfile.write(']')

这与以前的版本一样高效，概念上更简单 - 但更抽象。

Answer 2

通过对代码的最小编辑，您可以在Python中创建一个字典列表，并将其一次性转储到文件中（假设您的数据集足够小以适应内存）：

import csv
import json
import os

rows = []  # Create list
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    for line in infile:
         row = dict()
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         rows.append(row)  # Append row to list

    json.dump(rows[1:], outfile)  # Write entire list to file (except first row)

顺便说一句，你不应该在Python中使用id作为变量名，因为它是一个内置函数。

Answer 3

Pandas可以轻松应对：

df = pd.read_csv('names.csv', dtype=str)
df['team_members'] = (df['team_members']
                      .map(lambda s: s.split(','))
                      .map(lambda l: [x.strip() for x in l]))
records = df.to_dict('records')
json.dump(records, outfile)

Answer 4

似乎使用csv.DictReader类而不是重新发明轮子会容易得多：

import csv
import json

data = []
with open('names.csv', 'r', newline='') as infile:
    for row in csv.DictReader(infile):
        data.append(row)

with open('names1.json','w') as outfile:
    json.dump(data, outfile, indent=4)

names1.json文件后续执行的内容（我使用indent=4只是为了让它更具人性化）：

[
    {
        "id": "123",
        "team_name": "Biology",
        "team_members": "Ali Smith, Jon Doe"
    },
    {
        "id": "234",
        "team_name": "Math",
        "team_members": "Jane Smith"
    },
    {
        "id": "345",
        "team_name": "Statistics ",
        "team_members": "Matt P, Albert Shaw"
    },
    {
        "id": "456",
        "team_name": "Chemistry",
        "team_members": "Andrew M, Matt Shaw, Ali Smith"
    },
    {
        "id": "678",
        "team_name": "Physics",
        "team_members": "Joe Doe, Jane Smith, Ali Smith"
    }
]

添加字符并删除JSON文件中的最后一个逗号

4 个答案: