我正在做一个虚拟内存模拟器,但是遇到了问题。我需要从k(4)个文件中读取n(8)行,例如:我读取了文件1-文件2-文件3-文件4的前8行,然后又从每个文件中读取了9-17行,直到每个文件用完为止。
文件输入没有问题,并且已经完成了这段代码。
def rr_process(quantum, file, fline):
global rr_list #List to save the reading lines
condition = file_len(file) #Return the total lines of passed file
with open(file) as fp:
line = fp.readlines() #Save all the lines of the file in a list
for i in range(fline,fline+quantum): #for i in range(NewStartLine, NewStartLie + n_lines)
if i <= condition-1:
sline = line[i].rstrip()#Remove /n from lines
rr_list.append(sline) #append the n_lines to the list
else:
break
operation = concat_count//(n_proc*quantum) #total_lines//(k_files*n_lines)
for i in range(0,operation):
for fname in process: #Open each file (4)
rr_process(quantum,fname,fline) #Calls the read lines function
fline = fline + quantum + 1 #New start line number 0-9-17...
我根本没有成功,我需要读取5万行,但是我的程序只能读取44446。代码中的错误是什么?或有什么更好的方法来解决这个问题? 谢谢大家!
答案 0 :(得分:1)
使用grouper
模块的文档提供的roundrobin
和itertools
函数,可以将其简化为几行代码。
import contextlib
from itertools import zip_longest, cycle, islice, chain
# Define grouper() and roundrobin() here
with contextlib.ExitStack() as stack:
# Open each file *once*; the exit stack will make sure they get closed
files = [stack.enter_context(open(fname)) for frame in process]
# Instead of iterating over each file line by line, we'll iterate
# over them in 8-line batches.
groups = [grouper(f, 8) for f in files]
# Interleave the groups by taking an 8-line group from one file,
# then another, etc.
interleaved = roundrobin(*groups)
# *Then* flatten them into a stream of single lines
flattened = chain.from_iterable(interleaved)
# Filter out the None padding added by grouper() and
# read the lines into a list
lines = list(filter(lambda x: x is not None, flattened))
请注意,在调用list
之前,您实际上不会从文件中读取任何内容。您只是在建立一个功能管道,可以按需处理输入。
作为参考,这些是从the documentation复制的grouper
和roundrobin
的定义。
# From itertools documentation
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
# From itertools documentation
def roundrobin(*iterables):
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
num_active = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while num_active:
try:
for next in nexts:
yield next()
except StopIteration:
# Remove the iterator we just exhausted from the cycle.
num_active -= 1
nexts = cycle(islice(nexts, num_active))
答案 1 :(得分:0)
我最终得到的东西与chepner非常相似...
首先,我们定义了一个简单的文件,该文件在文件中的行上进行迭代,并将它们分组为块:
def read_blocks(path, nlines):
with open(path) as fd:
out = []
for line in fd:
out.append(line)
if len(out) == nlines:
yield out
out = []
if out:
yield out
然后,我定义一个函数,该函数交错一组迭代器的输出(即与chepner中的roundrobin
相同,我发现itertools
中的版本有些不透明):
def interleave(*iterables):
iterables = [iter(it) for it in iterables]
i = 0
while iterables:
try:
yield next(iterables[i])
except StopIteration:
del iterables[i]
else:
i += 1
if i >= len(iterables):
i = 0
然后,我们定义一个函数将以上内容组合在一起:
def read_files_in_blocks(filenames, nlines):
return interleave(*(read_blocks(path, nlines) for path in filenames))
并用一些伪数据调用它:
filenames = ['foo.txt', 'bar.txt', 'baz.txt']
for block in read_files_in_blocks(filenames, 5):
for line in block:
print(line)