我正在尝试通过CSV创建JSON文件。下面的代码创建数据但不是我想要的地方。我有一些python的经验。根据我的理解,JSON文件应该像[{},{},...,{}]一样编写。
我该怎么做?:
我可以插入','但是如何删除最后一个','?
如何在最开始插入'[',在最后插入']'?我尝试将它插入outputfile.write('['... etc),它显示了太多的地方。
在json文件的第一行不包含标题。
Names.csv:
id,team_name,team_members
123,Biology,"Ali Smith, Jon Doe"
234,Math,Jane Smith
345,Statistics ,"Matt P, Albert Shaw"
456,Chemistry,"Andrew M, Matt Shaw, Ali Smith"
678,Physics,"Joe Doe, Jane Smith, Ali Smith "
代码:
import csv
import json
import os
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
for line in infile:
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
json.dump(row,outfile)
outfile.write("," + "\n" )
到目前为止的输出:
{"id": "id", "team_name": "team_name", "team_members": ["team_members\n"]},
{"id": "123", "team_name": "Biology", "team_members": ["\"Ali Smith", " Jon Doe\"\n"]},
{"id": "234", "team_name": "Math", "team_members": ["Jane Smith \n"]},
{"id": "345", "team_name": "Statistics ", "team_members": ["\"Matt P", " Albert Shaw\"\n"]},
{"id": "456", "team_name": "Chemistry", "team_members": ["\"Andrew M", " Matt Shaw", " Ali Smith\"\n"]},
{"id": "678", "team_name": "Physics", "team_members": ["\"Joe Doe", " Jane Smith", " Ali Smith \""]},
答案 0 :(得分:2)
首先,你如何跳过标题?这很简单:
next(infile) # skip the first line
for line in infile:
但是,您可能需要考虑使用csv.DictReader
作为输入。它处理读取标题行,并使用其中的信息为每行创建一个字典,并为您分割行(以及您可能没有想到的处理案例,如CSV中可能出现的引用或转义文本文件):
for row in csv.DictReader(infile):
jsondump(row,outfile)
现在解决更难的问题。
更好的解决方案可能是使用可以将迭代器转储为JSON数组的迭代JSON库。然后你可以做这样的事情:
def rows(infile):
for line in infile:
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
yield row
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
genjson.dump(rows(infile), outfile)
stdlib json.JSONEncoder
在文档中有一个例子可以完成这个 - 尽管效率不高,因为它首先使用整个迭代器来构建一个列表,然后转储:
class GenJSONEncoder(json.JSONEncoder):
def default(self, o):
try:
iterable = iter(o)
except TypeError:
pass
else:
return list(iterable)
# Let the base class default method raise the TypeError
return json.JSONEncoder.default(self, o)
j = GenJSONEncoder()
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
outfile.write(j.encode(rows(infile)))
实际上,如果你愿意建立一个完整的列表而不是逐行编码,那么明确地进行listifying可能更简单:
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
json.dump(list(rows(infile)))
你可以通过覆盖iterencode
方法来进一步说明,但这将不那么简单,你可能想在PyPI上寻找一个高效,经过良好测试的流式迭代JSON库而不是自己从json
模块构建它。
但是,与此同时,这是您问题的直接解决方案,尽可能少地改变现有代码:
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
# print the opening [
outfile.write('[\n')
# keep track of the index, just to distinguish line 0 from the rest
for i, line in enumerate(infile):
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
# add the ,\n _before_ each row except the first
if i:
outfile.write(',\n')
json.dump(row,outfile)
# write the final ]
outfile.write('\n]')
这个技巧 - 处理第一个元素而不是最后一个 - 简化了这种类型的许多问题。
简化事物的另一种方法是使用itertools
文档中pairwise
示例的小变化来实际迭代相邻的线对:
def pairwise(iterable):
a, b = itertools.tee(iterable)
next(b, None)
return itertools.zip_longest(a, b, fillvalue=None)
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
# print the opening [
outfile.write('[\n')
# iterate pairs of lines
for line, nextline in pairwise(infile):
row = dict()
# print(row)
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
json.dump(row,outfile)
# add the , if there is a next line
if nextline is not None:
outfile.write(',')
outfile.write('\n')
# write the final ]
outfile.write(']')
这与以前的版本一样高效,概念上更简单 - 但更抽象。
答案 1 :(得分:0)
通过对代码的最小编辑,您可以在Python中创建一个字典列表,并将其一次性转储到文件中(假设您的数据集足够小以适应内存):
import csv
import json
import os
rows = [] # Create list
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
for line in infile:
row = dict()
id, team_name, *team_members = line.split(',')
row["id"] = id;
row["team_name"] = team_name;
row["team_members"] = team_members
rows.append(row) # Append row to list
json.dump(rows[1:], outfile) # Write entire list to file (except first row)
顺便说一句,你不应该在Python中使用id
作为变量名,因为它是一个内置函数。
答案 2 :(得分:0)
Pandas可以轻松应对:
df = pd.read_csv('names.csv', dtype=str)
df['team_members'] = (df['team_members']
.map(lambda s: s.split(','))
.map(lambda l: [x.strip() for x in l]))
records = df.to_dict('records')
json.dump(records, outfile)
答案 3 :(得分:0)
似乎使用csv.DictReader
类而不是重新发明轮子会容易得多:
import csv
import json
data = []
with open('names.csv', 'r', newline='') as infile:
for row in csv.DictReader(infile):
data.append(row)
with open('names1.json','w') as outfile:
json.dump(data, outfile, indent=4)
names1.json
文件后续执行的内容(我使用indent=4
只是为了让它更具人性化):
[
{
"id": "123",
"team_name": "Biology",
"team_members": "Ali Smith, Jon Doe"
},
{
"id": "234",
"team_name": "Math",
"team_members": "Jane Smith"
},
{
"id": "345",
"team_name": "Statistics ",
"team_members": "Matt P, Albert Shaw"
},
{
"id": "456",
"team_name": "Chemistry",
"team_members": "Andrew M, Matt Shaw, Ali Smith"
},
{
"id": "678",
"team_name": "Physics",
"team_members": "Joe Doe, Jane Smith, Ali Smith"
}
]