Python错误24:打开的文件太多:每个进程限制?

时间:2016-02-17 15:52:25

标签: python csv

使用此py代码将大型csv拆分为较小的csv(约)时,我收到错误:

" OSError:[错误24]打开文件过多:"

运行此文件后,应该有29,930个单独的文件,但是它会在2048年后停止。

我做了一些研究,看起来每个进程限制为2048.我怎样才能解决这个问题?

    #!/usr/bin/env python3
    import binascii
    import csv
    import os.path
    import sys
    from tkinter.filedialog import askopenfilename, askdirectory
    from tkinter.simpledialog import askinteger

    def split_csv_file(f, dst_dir, keyfunc):
        csv_reader = csv.reader(f)
        header = next(csv_reader)
        csv_writers = {}
        for row in csv_reader:
            k = keyfunc(row)
if k not in csv_writers:
            writer = csv.writer(open(os.path.join(dst_dir, k),
                                             mode='w', newline=''))
            writer.writerow(header)
            csv_writers[k] = writer
        csv_writers[k].writerow(row[0:1])

    def get_args_from_cli():
        input_filename = sys.argv[1]
        column = int(sys.argv[2])
        dst_dir = sys.argv[3]
        return (input_filename, column, dst_dir)

    def get_args_from_gui():
        input_filename = askopenfilename(
            filetypes=(('CSV', '.csv'),),
            title='Select CSV Input File')
        column = askinteger('Choose Table Column', 'Table column')
        dst_dir = askdirectory(title='Select Destination Directory')
        return (input_filename, column, dst_dir)

    if __name__ == '__main__':
        if len(sys.argv) == 1:
            input_filename, column, dst_dir = get_args_from_gui()
        elif len(sys.argv) == 4:
            input_filename, column, dst_dir = get_args_from_cli()
        else:
            raise Exception("Invalid number of arguments")
        with open(input_filename, mode='r', newline='') as f:
            split_csv_file(f, dst_dir, lambda r: r[column-1]+'.csv')
            # if the column has funky values resulting in invalid filenames
            # replace the line from above with:
            # split_csv_file(f, dst_dir, lambda r: binascii.b2a_hex(r[column-1].encode('utf-8')).decode('utf-8')+'.csv')

1 个答案:

答案 0 :(得分:1)

您不需要将csv作者保留在字典中。您可以重新打开要附加到的文件:

替换:

if k not in csv_writers:
    csv_writers[k] = csv.writer(open(os.path.join(dst_dir, k),
                                     mode='w', newline=''))
csv_writers[k].writerow(row)

使用:

filename = os.path.join(dst_dir, k)
with open(filename, mode='a', newline='') as output:
    csv.writer(output).writerow(row)