我尝试创建一个程序,将大型CSV文件拆分为较小的文件。我已经让这个功能运行得很好,除了它没有关闭最后一个文件,这意味着它永远不会完成写入该文件。这就是我所拥有的:
import csv
length of original file = 1000 rows
length_of_new_file = 100 # rows
def file_splitter(file_name, desired_length):
with open("{}".format(file_name), 'r') as original_file:
header = original_file.readline()
file_reader = csv.reader(original_file,dialect='excel')
file_count = 0
new_name = 'split_file_test'
loop = 0
while file_reader:
with open("{}{}.csv".format(new_name, file_count), 'w', newline='') as new_file:
new_file.write(header)
csv_writer = csv.writer(new_file, delimiter=',')
for line in file_reader:
if loop == (desired_length-1):
csv_writer.writerow(line)
new_file.close()
file_count += 1
loop = 0
break
else:
csv_writer.writerow(line)
loop += 1
test_file = 'zlotsacontacts.csv'
file_splitter(test_file, length_of_new_file)
我尝试添加new_file.close(),但无论我把它放在哪里,最后一个文件似乎永远不会被关闭。我也在最外层的while循环中尝试了不同的逻辑,如:
while file_reader != '':
和
while file_reader not None:
但是根据我的发现,CSV模块无法识别无值。我不知道我能做些什么来关闭这个循环!
答案 0 :(得分:2)
with open
将在文件结束后自动关闭。
while
循环陷入无限循环,因为它检查的条件只是while file_reader
file_reader
存在,所以它仍然是真的。
更好的方法是使用一个考虑文件数量的循环。
类似的东西:
while file_count < number_of_files:
...
或作为例子:
num_files = 5
count = 0
while count < num_files:
print(n_files)
count += 1
这样,当迭代遍历所有文件并最终关闭最后一个文件时,while循环将会中断
如果你需要找出文件中有多少行,你可以像这样计算
import csv
with open('lines.csv') as lines:
l = csv.reader(lines) # will read in larger files much better
row_count = sum(1 for row in l) - 1 # -1 to not count the header row, if it exists.
print(row_count)
答案 1 :(得分:0)
我应该花更多的时间来思考它。通过将'for line'移动到最外层循环,我可以检查是否有新文件(并在完成后删除它),这解决了无限循环问题:
def file_splitter(submitted_file, desired_length):
with open(submitted_file, 'r') as original_file:
header = original_file.readline()
file_reader = csv.reader(original_file, dialect='excel')
file_count = 0
new_name = 'a_file_test'
loop = 0
new_file = None
csv_writer = None
for line in file_reader:
if new_file is None or loop == 0:
new_file = open('{0}{1}.csv'.format(new_name, file_count), 'w', newline='')
new_file.write(header)
csv_writer = csv.writer(new_file, delimiter=',')
csv_writer.writerow(line)
loop += 1
if loop == desired_length - 1:
new_file.close()
file_count += 1
loop = 0