如果我在获得实际需要的内容之前有一堆随机文本的文本文件,如何在那里移动文件指针?
比方说,我的文本文件如下所示:
#foeijfoijeoijoijfoiej ijfoiejoi jfeoijfoifj i jfoei joi jo ijf eoij oie jojf
#feoijfoiejf ioj oij oi jo ij i joi jo ij oij #### oijroijf 3## # o
#foeijfoiej i jo i iojf 3 ## #io joi joij oi j## io joi joi j3# 3i ojoi joij
# The stuff I care about
(主题标签是实际文本文件的一部分)
如何将文件指针移动到我关心的内容行,然后如何让python告诉我行的编号,并在那里开始读取文件?
我已经尝试做一个循环来找到最后一个标签所在的行,然后从那里读取,但我仍然需要删除标签,并且需要行号。
答案 0 :(得分:0)
尝试使用readlines功能。这将返回包含每一行的列表。您可以使用for
循环来解析每一行,搜索您需要的内容,然后通过列表中的索引获取该行的编号。例如:
with open('some_file_path.txt') as f:
contents = f.readlines()
object = '#the line I am looking for'
for line in contents:
if object in line:
line_num = contents.index(object)
要摆脱英镑符号,只需使用replace功能即可。例如。 new_line = line.replace('#','')
答案 1 :(得分:0)
在不知道垃圾数据大小或扫描垃圾数据的情况下,您无法直接查找。但是,在你看到" good"之前,将itertools.dropwhile
中的文件包装成丢弃行并不是很难。数据,之后它遍历所有剩余的行:
import itertools
# Or def a regular function that returns True until you see the line
# delimiting the beginning of the "good" data
not_good = '# The stuff I care about\n'.__ne__
with open(filename) as f:
for line in itertools.dropwhile(not_good, f):
... You'll iterate the lines at and after the good line ...
如果您确实需要正确定位文件描述符,而不仅仅是行,则此变体应该起作用:
import io
with open(filename) as f:
# Get first good line
good_start = next(itertools.dropwhile(not_good, f))
# Seek back to undo the read of the first good line:
f.seek(-len(good_start), io.SEEK_CUR)
# f is now positioned at the beginning of the line that begins the good data
如果你真的需要它,你可以调整它来获得实际的行号(而不仅仅是需要偏移量)。虽然它的可读性稍差,但如果您需要这样做,那么通过enumerate
进行显式迭代可能会更有意义(左侧为练习)。让Python为你工作的方法是:
from future_builtins import map # Py2 only
from operator import itemgetter
with open(filename) as f:
linectr = itertools.count()
# Get first good line
# Pair each line with a 0-up number to advance the count generator, but
# strip it immediately so not_good only processes lines, not line nums
good_start = next(itertools.dropwhile(not_good, map(itemgetter(0), zip(f, linectr))))
good_lineno = next(linectr) # Keeps the 1-up line number by advancing once
# Seek back to undo the read of the first good line:
f.seek(-len(good_start), io.SEEK_CUR)
# f is now positioned at the beginning of the line that begins the good data