Question

我有一个文件，如下：

<prop type="ltattr-match">1-1</prop>
id =>3</prop>
<tuv xml:lang="en">
<seg> He is not a good man </seg>

而我想要的是在他不是好人之前检测第三行，即（id =＆gt; 3）。文件很大。我能做什么

Answer 1

我建议使用double ended queue最大长度：这样，只需要＆＃34;积压＆＃34;存储，您不必手动切片。我们不需要＆＃34;双端＆＃34;，但如果队列已满，正常的Queue类会阻塞。

import collections
dq = collections.deque([], 3)        # create an empty queue

with open("mybigfile.txt") as file:
    for line in file.readlines():
        if line.startswith('<seg>'):
            return dq[0]             # or add to list
        dq.append(line)              # save the line, if already 3 lines stored,
                                     # discard oldest line.

Answer 2

按顺序读取每一行，记住在任何时候只读取最后3行。

类似的东西：

# Assume f is a file object open to your file
last3 = []
last3.append( f.readline() )
last3.append( f.readline() )
last3.append( f.readline() )
while ( True ):
    line = f.readline()
    if (line satisfies condition):
        break
    last3 = last3[1:]+[line]
# At this point last3[0] is 3 lines before the matching line

你需要修改它来处理文件w /＆lt; 3行，或者如果没有符合您条件的行。

Answer 3

with open("mybigfile.txt") as file:
    lines = file.readlines()

for idx, line in enumerate(lines):
    if line.startswith("<seg>"):
        line_to_detect = lines[idx-3]
        #use idx-2 if you want the _second_ line before this one, 
        #ex `id =>3</prop>`
        print "This line was detected:"
        print line_to_detect

结果：

This line was detected:
<prop type="ltattr-match">1-1</prop>

正如我们previously discussed in chat一样，对于非常大的文件，此方法可能会占用大量内存。但是100页不是非常大，所以这应该没问题。

Answer 4

file = "path/to/the/file"
f = open(file, "r")
lines = f.readlines()
f.close()
i = 0
for line in lines:
    if "<seg> He is not a good man </seg>" in line:
       print(lines[i]) #Print the prvious line
    else
        i += 1

如果您在更改为print(lines[i-1])

之前需要第二行

如何在某一行之前检测第三行

4 个答案: