Question

我有一个工作脚本从一系列巨大的文本文件中提取某些数据。不幸的是，我走了“阅读线”的路线。因此，在处理了一定数量的文件后，我的代码内存不足。

我尝试重新编写代码，使用文件中的＆＃39; for line来逐行处理文件。格式，但是一旦找到字符串，我现在遇到线处理问题。

基本上一旦找到我的字符串，我希望转到文本文件中的各个周围的行，所以我希望回去说16（和10和4）行之前并做一些行处理来收集一些相关的数据到搜索线。使用readlines路由我枚举了文件，但我正在努力用逐行方法计算出正确的方法（或者确实找到它甚至可能！）。

这是我的代码，我承认我在那里有一些不好的代码，因为我已经玩了一些线路抓取，基本上围绕线[-xx]部分...

searchstringsFilter1 = ['Filter Used          : 1']


with open(file, 'r') as f:
    for line in f:

        timestampline = None
        timestamp = None

        for word in searchstringsFilter1:
            if word in line:
                #print line
                timestampline = line[-16]
                #print timestampline
                keyline = line
                Rline = line[-10]
                print Rline

                Rline = re.sub('[()]', '', Rline)   
                SNline = line[-4]
                SNline = re.sub('[()]', '', SNline) 

                split = keyline.split()
                str = timestampline
                match = re.search(r'\d{2}:\d{2}:\d{2}.\d{3}', str)
                valueR = Rline.split()
                valueSN = SNline.split()

                split = line.split()

                worksheetFilter.write(row_num,0,match.group()) 
                worksheetFilter.write(row_num,1,split[3], integer_format)
                worksheetFilter.write(row_num,2,valueR[4], decimal_format)
                worksheetFilter.write(row_num,3,valueSN[3], decimal_format)
                row_num+=1
                tot = tot+1
                break

    print 'total count for', '"',searchstringsFilter1[a],'"', 'is', tot
    Filtertot = tot
    tot = 0

有什么明显我做错了，或者我是否遵循了一条完全不正确的道路来做我想做的事情？

非常感谢您阅读本文， MikG

Answer 1

如果你知道一次需要使用多少行（假设你一次需要16行），你可以这样做：

with open(file, 'r') as f:
    # Some sort of loop...
    chunk = [next(f) for x in xrange(16)]

chunk应包含文件的后16行。

编辑：经过一些澄清后，这可能会更有用：

with open(file, 'r') as f:
    chunk = [next(f) for x in xrange(16)]

    while not whatWeWant(chunk[15]):
        chunk.append(next(f))
        chunk.pop(0)

显然，这需要一些警卫和检查，但我认为这就是你想要的。 chunk [15]将是你想要找到的行，而chunk [0:15]将是它之前的行。

Answer 2

您需要circular buffer暂时保留内存中的上一行。这可以使用collections.deque：

获得

import collections

ring_buf = collections.deque(maxlen=17)

with open(file, 'r') as f:
    for line in f:
        ring_buf.append([line]) # append the new line and overwrite the last one
                              # FIFO style

        timestampline = None
        timestamp = None

        for word in searchstringsFilter1:
            if word in line:
                #print line
                timestampline = ring_buf[-16]
                #print timestampline
                keyline = line
                Rline = ring_buf[-10]
                print Rline

                Rline = re.sub('[()]', '', Rline)   
                SNline = ring_buf[-4]
                SNline = re.sub('[()]', '', SNline)

使用Python重新编写我的代码来自＆＃39; readlines＆＃39;到＆＃39; for line in in file＆＃39;格式

2 个答案: