将数据拆分为块并将数字添加到结果中

时间:2014-09-12 13:55:08

标签: python multiprocessing

def multi():
   jobs = []
   r = open('raw.txt', 'r', 16777216).read().split('\n')
   for i in r:
      p = mp.Process(target=all, args=(i,))
      jobs.append(p)
      p.start()

raw.txt的每一行都是URL

请解释我如何修改multi()到

a)将raw.txt拆分成块(比如每行10行)并将all()应用于每个块并

b)最后返回处理过的行/块的数量

谢谢,

1 个答案:

答案 0 :(得分:0)

看看itertools包它有很多有用的东西。

>>> with open('input.txt', 'w') as f:
...   for i in xrange(998):
...     f.write(uuid.uuid4().get_hex() + '\n')
... 
>>> 
>>> from itertools import groupby, count    
>>> with open('input.txt', 'r') as f:
...     samples = groupby(f, key=lambda k, line=count(): next(line)//100)
...     for i in samples:
...       print i
... 
(0, <itertools._grouper object at 0x7f174f170c50>)
(1, <itertools._grouper object at 0x7f1740804f50>)
(2, <itertools._grouper object at 0x7f174f170c50>)
(3, <itertools._grouper object at 0x7f1740804f50>)
(4, <itertools._grouper object at 0x7f174f170c50>)
(5, <itertools._grouper object at 0x7f1740804f50>)
(6, <itertools._grouper object at 0x7f174f170c50>)
(7, <itertools._grouper object at 0x7f1740804f50>)
(8, <itertools._grouper object at 0x7f174f170c50>)
(9, <itertools._grouper object at 0x7f1740804f50>)