Question

如果我在获得实际需要的内容之前有一堆随机文本的文本文件，如何在那里移动文件指针？

比方说，我的文本文件如下所示：

#foeijfoijeoijoijfoiej ijfoiejoi jfeoijfoifj  i jfoei joi jo ijf eoij oie jojf
#feoijfoiejf   ioj oij       oi jo ij   i joi jo ij oij  ####  oijroijf 3## # o
#foeijfoiej i jo i  iojf 3 ##  #io joi joij oi j## io joi joi j3# 3i ojoi joij
# The stuff I care about

（主题标签是实际文本文件的一部分）

如何将文件指针移动到我关心的内容行，然后如何让python告诉我行的编号，并在那里开始读取文件？

我已经尝试做一个循环来找到最后一个标签所在的行，然后从那里读取，但我仍然需要删除标签，并且需要行号。

Answer 1

尝试使用readlines功能。这将返回包含每一行的列表。您可以使用for循环来解析每一行，搜索您需要的内容，然后通过列表中的索引获取该行的编号。例如：

with open('some_file_path.txt') as f:
    contents = f.readlines()
object = '#the line I am looking for'
for line in contents:
    if object in line:
        line_num = contents.index(object)

要摆脱英镑符号，只需使用replace功能即可。例如。 new_line = line.replace('#','')

Answer 2

在不知道垃圾数据大小或扫描垃圾数据的情况下，您无法直接查找。但是，在你看到＆＃34; good＆＃34;之前，将itertools.dropwhile中的文件包装成丢弃行并不是很难。数据，之后它遍历所有剩余的行：

import itertools

# Or def a regular function that returns True until you see the line
# delimiting the beginning of the "good" data
not_good = '# The stuff I care about\n'.__ne__

with open(filename) as f:
    for line in itertools.dropwhile(not_good, f):
        ... You'll iterate the lines at and after the good line ...

如果您确实需要正确定位文件描述符，而不仅仅是行，则此变体应该起作用：

import io

with open(filename) as f:
    # Get first good line
    good_start = next(itertools.dropwhile(not_good, f))

    # Seek back to undo the read of the first good line:
    f.seek(-len(good_start), io.SEEK_CUR)

    # f is now positioned at the beginning of the line that begins the good data

如果你真的需要它，你可以调整它来获得实际的行号（而不仅仅是需要偏移量）。虽然它的可读性稍差，但如果您需要这样做，那么通过enumerate进行显式迭代可能会更有意义（左侧为练习）。让Python为你工作的方法是：

from future_builtins import map  # Py2 only
from operator import itemgetter

with open(filename) as f:
    linectr = itertools.count()
    # Get first good line
    # Pair each line with a 0-up number to advance the count generator, but
    # strip it immediately so not_good only processes lines, not line nums 
    good_start = next(itertools.dropwhile(not_good, map(itemgetter(0), zip(f, linectr))))

    good_lineno = next(linectr) # Keeps the 1-up line number by advancing once

    # Seek back to undo the read of the first good line:
    f.seek(-len(good_start), io.SEEK_CUR)

    # f is now positioned at the beginning of the line that begins the good data

如何通过Python中的字符搜索跳到一行

2 个答案: