我有一个文件,如下:
<prop type="ltattr-match">1-1</prop>
id =>3</prop>
<tuv xml:lang="en">
<seg> He is not a good man </seg>
而我想要的是在他不是好人之前检测第三行,即(id =&gt; 3)。文件很大。我能做什么
答案 0 :(得分:2)
我建议使用double ended queue最大长度:这样,只需要&#34;积压&#34;存储,您不必手动切片。我们不需要&#34;双端&#34;,但如果队列已满,正常的Queue
类会阻塞。
import collections
dq = collections.deque([], 3) # create an empty queue
with open("mybigfile.txt") as file:
for line in file.readlines():
if line.startswith('<seg>'):
return dq[0] # or add to list
dq.append(line) # save the line, if already 3 lines stored,
# discard oldest line.
答案 1 :(得分:1)
按顺序读取每一行,记住在任何时候只读取最后3行。
类似的东西:
# Assume f is a file object open to your file
last3 = []
last3.append( f.readline() )
last3.append( f.readline() )
last3.append( f.readline() )
while ( True ):
line = f.readline()
if (line satisfies condition):
break
last3 = last3[1:]+[line]
# At this point last3[0] is 3 lines before the matching line
你需要修改它来处理文件w /&lt; 3行,或者如果没有符合您条件的行。
答案 2 :(得分:1)
with open("mybigfile.txt") as file:
lines = file.readlines()
for idx, line in enumerate(lines):
if line.startswith("<seg>"):
line_to_detect = lines[idx-3]
#use idx-2 if you want the _second_ line before this one,
#ex `id =>3</prop>`
print "This line was detected:"
print line_to_detect
结果:
This line was detected:
<prop type="ltattr-match">1-1</prop>
正如我们previously discussed in chat一样,对于非常大的文件,此方法可能会占用大量内存。但是100页不是非常大,所以这应该没问题。
答案 3 :(得分:0)
file = "path/to/the/file"
f = open(file, "r")
lines = f.readlines()
f.close()
i = 0
for line in lines:
if "<seg> He is not a good man </seg>" in line:
print(lines[i]) #Print the prvious line
else
i += 1
如果您在更改为print(lines[i-1])