我有一个csv文件,我需要以1000行的形式写入json文件.csv文件有大约9,000行,所以理想情况下我最终会得到9个连续数据的独立json文件。
我知道如何将一个csv文件写入json - 我一直在做的事情:
csvfile = open("C:\\Users\Me\Desktop\data\data.csv", 'r', encoding="utf8")
reader = csv.DictReader(csvfile, delimiter = ",")
out = json.dumps( [ row for row in reader ] )
with open("C:\\Users\Me\Desktop\data\data.json", 'w') as f:
f.write(out)
效果很好。但我需要json文件是9个分割文件。现在,我假设我要么:
1)尝试计算行并在达到1,000
时停止2)将csv文件写入单个json文件,然后打开json并尝试以某种方式将其拆分。
我很失落如何实现这一点 - 任何帮助表示赞赏!
答案 0 :(得分:2)
将整个CSV文件读入一个或多个行,然后将长度为1000的切片写入JSON文件。
import csv
import json
input_file = 'C:\\Users\\Me\\Desktop\\data\\data.csv'
output_file_template = 'C:\\Users\\Me\\Desktop\\data\\data_{}.json'
with open(input_file, 'r', encoding='utf8') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
rows = list(reader)
for i in range(len(rows) // 1000):
out = json.dumps(rows[1000*i:1000*(i+1)])
with open(output_file_template.format(i), 'w') as f:
f.write(out)
答案 1 :(得分:2)
您可以迭代(减少内存使用量),而不是读取整个CSV文件。
例如,这是一个简单的行迭代:
with open(input_file, 'r', encoding='utf8') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for row in reader:
print(row)
在迭代期间,您可以枚举行并使用此值来计算1000行的组:
group_size = 1000
with open(input_file, 'r', encoding='utf8') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for index, row in enumerate(reader):
group_idx = index // group_size
print(group_idx, row)
你应该有这样的东西:
0 [row 0...]
0 [row 1...]
0 [row 2...]
...
0 [row 999...]
1 [row 1000...]
1 [row 1001...]
etc.
您可以使用itertools.groupby将您的行分组1000。
使用Alberto Garcia-Raboso的解决方案,您可以使用:
from __future__ import division
import csv
import json
import itertools
input_file = 'C:\\Users\\Me\\Desktop\\data\\data.csv'
output_file_template = 'C:\\Users\\Me\\Desktop\\data\\data_{}.json'
group_size = 1000
with open(input_file, 'r', encoding='utf8') as csvfile:
reader = csv.DictReader(csvfile, delimiter=',')
for key, group in itertools.groupby(enumerate(rows),
key=lambda item: item[0] // group_size):
grp_rows = [item[1] for item in group]
content = json.dumps(grp_rows)
with open(output_file_template.format(key), 'w') as jsonfile:
jsonfile.write(content)
例如一些假数据:
from __future__ import division
import itertools
rows = [[1, 2], [3, 4], [5, 6], [7, 8],
[1, 2], [3, 4], [5, 6], [7, 8],
[1, 2], [3, 4], [5, 6], [7, 8],
[1, 2], [3, 4], [5, 6], [7, 8],
[1, 2], [3, 4], [5, 6], [7, 8]]
group_size = 4
for key, group in itertools.groupby(enumerate(rows),
key=lambda item: item[0] // group_size):
g_rows = [item[1] for item in group]
print(key, g_rows)
你会得到:
0 [[1, 2], [3, 4], [5, 6], [7, 8]]
1 [[1, 2], [3, 4], [5, 6], [7, 8]]
2 [[1, 2], [3, 4], [5, 6], [7, 8]]
3 [[1, 2], [3, 4], [5, 6], [7, 8]]
4 [[1, 2], [3, 4], [5, 6], [7, 8]]
答案 2 :(得分:0)
没有理由使用Dictreader,常规 csv.reader 会很好。您还可以在reader对象上使用 itertool.islice 将数据切片为n
行,并将每个集合转储到新文件中:
from itertools import islice, count
import csv
import json
with open("C:\\Users\Me\Desktop\data\data.csv") as f:
reader, cnt = csv.reader(f), count(1)
for rows in iter(lambda: list(islice(reader, 1000)), []):
with open("C:\\Users\Me\Desktop\data\data{}.json".format(next(cnt))) as out:
json.dump(rows, out)
答案 3 :(得分:-1)
这将读取文件data.csv
一次,并将创建标识data_1.json
到data_9.json
的单独json文件,因为有9000行。
只要data.csv
中的行数是1000的倍数,它就会创建number_of_rows/1000
个文件而无需更改代码。
csvfile = open("C:\\Users\Me\Desktop\data\data.csv", 'rb', encoding="utf8")
reader = csv.DictReader(csvfile, delimiter = ",")
r = []
counter = 0
fileid = 1
for row in reader:
r.append( row )
counter += 1
if counter == 999:
out = json.dumps( r )
fname = "C:\\Users\Me\Desktop\data\data_"+ str(fileid) + ".json"
with open( fname, 'wb' ) as f:
f.write( out )
# resetting & updating variables
fileid += 1
counter = 0
r = []
out = None