在python中迭代特定的csv行会输出一个空白文件

时间:2017-06-16 00:00:48

标签: python csv itertools

python newb here - 我正在尝试格式化一组非常粗略的csv我被发送,以便我可以将它们放入一个很好的postgres表中进行查询和分析。为了做到这一点,我首先使用csv.writer清除它们以删除包装每个条目的空行和双引号。这是我的代码的样子:

import os
import csv
import glob
from itertools import islice

files = glob.glob('/Users/foo/bar/*.csv')

# Loop through all of the csv's  
for file in files:
    # Get the filename from the path
    outfile = os.path.basename(file)

    with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:

        reader = csv.reader(inp)
        writer = csv.writer(out)
        for row in reader:
            if row:
                writer.writerow(row)
        out.close() 

它完美无缺,完全符合我的要求。输出csv看起来很棒。接下来,我尝试基本上从新清理的csv文件的开头和结尾切掉包含完全不必要的垃圾的一定数量的行(省略前8行和后2行)。由于我无法确定的原因,csv从代码的这一部分输出(缩写与之前的'with'块相同)完全为空:

with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
    writer2 = csv.writer(out2)
    reader2 = csv.reader(inp2)
    row_count = sum(1 for row in reader2)
    last_line_index = row_count - 3 
    for row in islice(reader2, 7, last_line_index):
            writer2.writerow(row)
    out2.close()

我知道由于我的'with'用法,每个块末尾的close()是多余的 - 我在查看here之后尝试了它作为一种方法。我还尝试将第二个'with'块放入另一个文件中并在运行第一个'with'块后运行它,但仍无济于事。非常感谢您的帮助!

此外,这是整个文件:

import os
import csv
import glob
from itertools import islice

files = glob.glob('/Users/foo/bar/*.csv')

# Loop through all of the csv's  
for file in files:
    # Get the filename from the path
    outfile = os.path.basename(file)

    with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:

        reader = csv.reader(inp)
        writer = csv.writer(out)
        for row in reader:
            if row:
                writer.writerow(row)
        out.close() 

    with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
        writer2 = csv.writer(out2)
        reader2 = csv.reader(inp2)
        row_count = sum(1 for row in reader2)
        last_line_index = row_count - 3 
        for row in islice(reader2, 7, last_line_index):
                writer2.writerow(row)
        out2.close()

谢谢!

2 个答案:

答案 0 :(得分:2)

有罪的一方是

row_count = sum(1 for row in reader2)

它读取reader2的所有数据;现在,当您尝试for row in islice(reader2, 7, last_line_index)时,您无法获得任何数据。

此外,您可能正在阅读大量空白行,因为您将文件打开为二进制文件;而是做

with open('file.csv', newline='') as inf:
    rd = csv.reader(inf)

答案 1 :(得分:1)

你可以快速修复这样的代码(我对问题进行了评论,正如@Hugh Bothwell所说,你已经阅读了变量reader2中的所有数据):

import os
import csv
import glob
from itertools import islice

files = glob.glob('/Users/foo/bar/*.csv')

# Loop through all of the csv's  
for file in files:
    # Get the filename from the path
    outfile = os.path.basename(file)

    with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:

        reader = csv.reader(inp)
        writer = csv.writer(out)
        for row in reader:
            if row:
                writer.writerow(row)
        out.close() 

    with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
            writer2 = csv.writer(out2)
            reader2 = csv.reader(inp2)
            row_count = sum(1 for row in csv.reader(inp2)) #here you separately count the amount of rows without read the variable reader2
            last_line_index = row_count - 3 
            for row in islice(reader2, 7, last_line_index):
                    writer2.writerow(row)
            out2.close()