添加字符并删除JSON文件中的最后一个逗号

时间:2018-05-01 15:54:03

标签: python json

我正在尝试通过CSV创建JSON文件。下面的代码创建数据但不是我想要的地方。我有一些python的经验。根据我的理解,JSON文件应该像[{},{},...,{}]一样编写。

我该怎么做?:

  1. 我可以插入','但是如何删除最后一个','?

  2. 如何在最开始插入'[',在最后插入']'?我尝试将它插入outputfile.write('['... etc),它显示了太多的地方。

  3. 在json文件的第一行不包含标题。

  4. Names.csv:

    id,team_name,team_members
    123,Biology,"Ali Smith, Jon Doe"
    234,Math,Jane Smith 
    345,Statistics ,"Matt P, Albert Shaw"
    456,Chemistry,"Andrew M, Matt Shaw, Ali Smith"
    678,Physics,"Joe Doe, Jane Smith, Ali Smith "
    

    代码:

    import csv
    import json
    import os
    
    with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
        for line in infile:
             row = dict()
             # print(row)
             id, team_name, *team_members = line.split(',')
             row["id"] = id;
             row["team_name"] = team_name;
             row["team_members"] = team_members
             json.dump(row,outfile)
             outfile.write("," + "\n" )
    

    到目前为止的输出:

    {"id": "id", "team_name": "team_name", "team_members": ["team_members\n"]},
    {"id": "123", "team_name": "Biology", "team_members": ["\"Ali Smith", " Jon Doe\"\n"]},
    {"id": "234", "team_name": "Math", "team_members": ["Jane Smith \n"]},
    {"id": "345", "team_name": "Statistics ", "team_members": ["\"Matt P", " Albert Shaw\"\n"]},
    {"id": "456", "team_name": "Chemistry", "team_members": ["\"Andrew M", " Matt Shaw", " Ali Smith\"\n"]},
    {"id": "678", "team_name": "Physics", "team_members": ["\"Joe Doe", " Jane Smith", " Ali Smith \""]},
    

4 个答案:

答案 0 :(得分:2)

首先,你如何跳过标题?这很简单:

next(infile) # skip the first line
for line in infile:

但是,您可能需要考虑使用csv.DictReader作为输入。它处理读取标题行,并使用其中的信息为每行创建一个字典,并为您分割行(以及您可能没有想到的处理案例,如CSV中可能出现的引用或转义文本文件):

for row in csv.DictReader(infile):
    jsondump(row,outfile)

现在解决更难的问题。

更好的解决方案可能是使用可以将迭代器转储为JSON数组的迭代JSON库。然后你可以做这样的事情:

def rows(infile):
    for line in infile:
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         yield row

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    genjson.dump(rows(infile), outfile)

stdlib json.JSONEncoder在文档中有一个例子可以完成这个 - 尽管效率不高,因为它首先使用整个迭代器来构建一个列表,然后转储:

class GenJSONEncoder(json.JSONEncoder):
    def default(self, o):
       try:
           iterable = iter(o)
       except TypeError:
           pass
       else:
           return list(iterable)
       # Let the base class default method raise the TypeError
       return json.JSONEncoder.default(self, o)

j = GenJSONEncoder()
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    outfile.write(j.encode(rows(infile)))

实际上,如果你愿意建立一个完整的列表而不是逐行编码,那么明确地进行listifying可能更简单:

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    json.dump(list(rows(infile)))

你可以通过覆盖iterencode方法来进一步说明,但这将不那么简单,你可能想在PyPI上寻找一个高效,经过良好测试的流式迭代JSON库而不是自己从json模块构建它。

但是,与此同时,这是您问题的直接解决方案,尽可能少地改变现有代码:

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    # print the opening [
    outfile.write('[\n')
    # keep track of the index, just to distinguish line 0 from the rest
    for i, line in enumerate(infile):
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         # add the ,\n _before_ each row except the first
         if i:
             outfile.write(',\n')
         json.dump(row,outfile)
    # write the final ]
    outfile.write('\n]')

这个技巧 - 处理第一个元素而不是最后一个 - 简化了这种类型的许多问题。

简化事物的另一种方法是使用itertools文档中pairwise示例的小变化来实际迭代相邻的线对:

def pairwise(iterable):
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.zip_longest(a, b, fillvalue=None)

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    # print the opening [
    outfile.write('[\n')
    # iterate pairs of lines
    for line, nextline in pairwise(infile):
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         json.dump(row,outfile)
         # add the , if there is a next line
         if nextline is not None:
             outfile.write(',')
         outfile.write('\n')
    # write the final ]
    outfile.write(']')

这与以前的版本一样高效,概念上更简单 - 但更抽象。

答案 1 :(得分:0)

通过对代码的最小编辑,您可以在Python中创建一个字典列表,并将其一次性转储到文件中(假设您的数据集足够小以适应内存):

import csv
import json
import os

rows = []  # Create list
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    for line in infile:
         row = dict()
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         rows.append(row)  # Append row to list

    json.dump(rows[1:], outfile)  # Write entire list to file (except first row)

顺便说一句,你不应该在Python中使用id作为变量名,因为它是一个内置函数。

答案 2 :(得分:0)

Pandas可以轻松应对:

df = pd.read_csv('names.csv', dtype=str)
df['team_members'] = (df['team_members']
                      .map(lambda s: s.split(','))
                      .map(lambda l: [x.strip() for x in l]))
records = df.to_dict('records')
json.dump(records, outfile)

答案 3 :(得分:0)

似乎使用csv.DictReader类而不是重新发明轮子会容易得多:

import csv
import json

data = []
with open('names.csv', 'r', newline='') as infile:
    for row in csv.DictReader(infile):
        data.append(row)

with open('names1.json','w') as outfile:
    json.dump(data, outfile, indent=4)

names1.json文件后续执行的内容(我使用indent=4只是为了让它更具人性化):

[
    {
        "id": "123",
        "team_name": "Biology",
        "team_members": "Ali Smith, Jon Doe"
    },
    {
        "id": "234",
        "team_name": "Math",
        "team_members": "Jane Smith"
    },
    {
        "id": "345",
        "team_name": "Statistics ",
        "team_members": "Matt P, Albert Shaw"
    },
    {
        "id": "456",
        "team_name": "Chemistry",
        "team_members": "Andrew M, Matt Shaw, Ali Smith"
    },
    {
        "id": "678",
        "team_name": "Physics",
        "team_members": "Joe Doe, Jane Smith, Ali Smith"
    }
]