如何在某一行之前检测第三行

时间:2014-04-25 15:39:50

标签: python python-2.7

我有一个文件,如下:

<prop type="ltattr-match">1-1</prop>
id =>3</prop>
<tuv xml:lang="en">
<seg> He is not a good man </seg>

而我想要的是在他不是好人之前检测第三行,即(id =&gt; 3)。文件很大。我能做什么

4 个答案:

答案 0 :(得分:2)

我建议使用double ended queue最大长度:这样,只需要&#34;积压&#34;存储,您不必手动切片。我们不需要&#34;双端&#34;,但如果队列已满,正常的Queue类会阻塞。

import collections
dq = collections.deque([], 3)        # create an empty queue

with open("mybigfile.txt") as file:
    for line in file.readlines():
        if line.startswith('<seg>'):
            return dq[0]             # or add to list
        dq.append(line)              # save the line, if already 3 lines stored,
                                     # discard oldest line.

答案 1 :(得分:1)

按顺序读取每一行,记住在任何时候只读取最后3行。

类似的东西:

# Assume f is a file object open to your file
last3 = []
last3.append( f.readline() )
last3.append( f.readline() )
last3.append( f.readline() )
while ( True ):
    line = f.readline()
    if (line satisfies condition):
        break
    last3 = last3[1:]+[line]
# At this point last3[0] is 3 lines before the matching line

你需要修改它来处理文件w /&lt; 3行,或者如果没有符合您条件的行。

答案 2 :(得分:1)

with open("mybigfile.txt") as file:
    lines = file.readlines()

for idx, line in enumerate(lines):
    if line.startswith("<seg>"):
        line_to_detect = lines[idx-3]
        #use idx-2 if you want the _second_ line before this one, 
        #ex `id =>3</prop>`
        print "This line was detected:"
        print line_to_detect

结果:

This line was detected:
<prop type="ltattr-match">1-1</prop>

正如我们previously discussed in chat一样,对于非常大的文件,此方法可能会占用大量内存。但是100页不是非常大,所以这应该没问题。

答案 3 :(得分:0)

file = "path/to/the/file"
f = open(file, "r")
lines = f.readlines()
f.close()
i = 0
for line in lines:
    if "<seg> He is not a good man </seg>" in line:
       print(lines[i]) #Print the prvious line
    else
        i += 1

如果您在更改为print(lines[i-1])

之前需要第二行