Python 3 - CSV读取器到达文件末尾

时间:2018-04-14 02:51:06

标签: python python-3.x csv writer reader

我尝试创建一个程序,将大型CSV文件拆分为较小的文件。我已经让这个功能运行得很好,除了它没有关闭最后一个文件,这意味着它永远不会完成写入该文件。这就是我所拥有的:

import csv

length of original file = 1000 rows
length_of_new_file = 100  # rows


def file_splitter(file_name, desired_length):
    with open("{}".format(file_name), 'r') as original_file:
        header = original_file.readline()
        file_reader = csv.reader(original_file,dialect='excel')
        file_count = 0
        new_name = 'split_file_test'
        loop = 0
        while file_reader:
            with open("{}{}.csv".format(new_name, file_count), 'w', newline='') as new_file:
                new_file.write(header)
                csv_writer = csv.writer(new_file, delimiter=',')
                for line in file_reader:
                    if loop == (desired_length-1):
                        csv_writer.writerow(line)
                        new_file.close()
                        file_count += 1
                        loop = 0
                        break
                    else:
                        csv_writer.writerow(line)
                        loop += 1


test_file = 'zlotsacontacts.csv'

file_splitter(test_file, length_of_new_file)

我尝试添加new_file.close(),但无论我把它放在哪里,最后一个文件似乎永远不会被关闭。我也在最外层的while循环中尝试了不同的逻辑,如:

while file_reader != '':

while file_reader not None:

但是根据我的发现,CSV模块无法识别无值。我不知道我能做些什么来关闭这个循环!

2 个答案:

答案 0 :(得分:2)

with open将在文件结束后自动关闭。

while循环陷入无限循环,因为它检查的条件只是while file_reader

file_reader存在,所以它仍然是真的。

更好的方法是使用一个考虑文件数量的循环。

类似的东西:

while file_count < number_of_files:
     ...

或作为例子:

num_files = 5

count = 0

while count < num_files:
    print(n_files)
    count += 1

这样,当迭代遍历所有文件并最终关闭最后一个文件时,while循环将会中断

如果你需要找出文件中有多少行,你可以像这样计算

import csv

with open('lines.csv') as lines:
    l = csv.reader(lines) # will read in larger files much better
    row_count = sum(1 for row in l) - 1 # -1 to not count the header row, if it exists.
print(row_count)

答案 1 :(得分:0)

我应该花更多的时间来思考它。通过将'for line'移动到最外层循环,我可以检查是否有新文件(并在完成后删除它),这解决了无限循环问题:

def file_splitter(submitted_file, desired_length):
    with open(submitted_file, 'r') as original_file:
        header = original_file.readline()
        file_reader = csv.reader(original_file, dialect='excel')
        file_count = 0
        new_name = 'a_file_test'
        loop = 0
        new_file = None
        csv_writer = None
        for line in file_reader:
            if new_file is None or loop == 0:
                new_file = open('{0}{1}.csv'.format(new_name, file_count), 'w', newline='')
                new_file.write(header)
                csv_writer = csv.writer(new_file, delimiter=',')
            csv_writer.writerow(line)
            loop += 1
            if loop == desired_length - 1:
                new_file.close()
                file_count += 1
                loop = 0