Question

如何优化此功能，使其看起来更像pythonic？

def flatten_rows_to_file(filename, rows):
    f = open(filename, 'a+')
    temp_ls = list()
    for i, row in enumerate(rows):
        temp_ls.append("%(id)s\t%(price)s\t%(site_id)s\t%(rating)s\t%(shop_id)s\n" % row)
        if i and i % 100000 == 0:
            f.writelines(temp_ls)
            temp_ls = []
    f.writelines(temp_ls)
    f.close()

Answer 1

立即浮现在脑海中的一些事情：

使用with语句，而不是手动关闭文件。
将生成器表达式传递给f.writelines，而不是一遍又一遍地构建100000行列表（让标准库处理多少，如果有的话，它会缓冲输出）。
或者，更好的是，使用csv模块来处理以制表符分隔的输出。

以下是对一些改进代码的快速尝试：

from csv import DictWriter

def flatten_rows_to_file(filename, rows):
    with open(filename, 'ab') as f:
        writer = DictWriter(f, ['id','price','site_id','rating','shop_id'],
                            delimiter='\t')
        writer.writerows(rows)

请注意，如果您使用的是Python 3，则需要稍微不同的代码才能打开文件。使用模式'a'而不是'ab'并添加关键字参数newline=""。在你使用的模式中你不需要+（你只是写作，而不是写作和阅读）。

如果rows参数中的值可能有超出您正在编写的键的额外键，则还需要将一些额外的参数传递给DictWriter构造函数。

Answer 2

通常最好使用with语句来确保文件正确关闭。另外，除非我弄错了，否则不需要手动缓冲线。您也可以在打开文件时指定缓冲区大小，确定how often the file is flushed。

def flatten_rows_to_file(filename, rows, buffsize=100000):
    with open(filename, 'a+', buffsize) as f:
        for row in rows:
            f.write("%(id)s\t%(price)s\t%(site_id)s\t%(rating)s\t%(shop_id)s\n" % row)

我该怎么做才能优化这个功能，让它看起来更像pythonic？

2 个答案: