Question

我有一个非常简单的这种类型的csv文件（我把斐波那契数字作为例子）：

我只是尝试按以下方式批量处理行（fib数不相关）

import csv
b=0
s=1
i=1
itera=0
maximum=10000
bulk_save=10
csv_file='really_simple.csv'
fo = open(csv_file)
reader = csv.reader(fo)
##Skipping headers
_headers=reader.next()

while (s>0) and itera<maximum:
    print 'processing...'
    b+=1
    tobesaved=[]
    for row,i in zip(reader,range(1,bulk_save+1)): 
        itera+=1
        tobesaved.append(row)
        print itera,row[0]    
    s=len(tobesaved)        
    print 'chunk no '+str(b)+' processed '+str(s)+' rows'  
print 'Exit.'

我得到的输出有点奇怪（好像读者在循环结束时省略了一个条目）

processing...
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
chunk no 1 commited 10 rows
processing...
11 12
12 13
13 14
14 15
15 16
16 17
17 18
18 19
19 20
20 21
chunk no 2 commited 10 rows
processing...
21 23
22 24
23 25
24 26
25 27
chunk no 3 commited 5 rows
processing...
chunk no 4 commited 0 rows
Exit.

你知道问题是什么吗？我猜是拉链功能。

我有这样的代码（获取数据块）的原因是我需要将批量csv条目保存到sqlite3数据库（在每个zip循环结束时使用executemany和commit，这样我就不会重载我的记忆。谢谢！

Answer 1

请尝试以下操作：

import csv

def process(rows, chunk_no):
    for no, data in rows:
        print no, data
    print 'chunk no {} process {} rows'.format(chunk_no, len(rows))

csv_file='really_simple.csv'
with open(csv_file) as fo:
    reader = csv.reader(fo)
    _headers = reader.next()

    chunk_no = 1
    tobesaved = []
    for row in reader:
        tobesaved.append(row)
        if len(tobesaved) == 10:
            process(tobesaved, chunk_no)
            chunk_no += 1
            tobesaved = []
    if tobesaved:
        process(tobesaved, chunk_no)

打印

1 1
2 1
3 2
4 3
5 5
6 8
7 13
8 21
9 34
10 55
chunk no 1 process 10 rows
11 89
12 144
13 233
14 377
15 610
16 987
17 1597
18 2584
19 4181
20 6765
chunk no 2 process 10 rows
21 10946
22 17711
23 28657
24 46368
25 75025
26 121393
27 196418
chunk no 3 process 7 rows

具有范围的Python csv reader-zipping阅读器

1 个答案: