如何通过Python中的字符搜索跳到一行

时间:2016-10-19 00:25:14

标签: python loops text

如果我在获得实际需要的内容之前有一堆随机文本的文本文件,如何在那里移动文件指针?

比方说,我的文本文件如下所示:

#foeijfoijeoijoijfoiej ijfoiejoi jfeoijfoifj  i jfoei joi jo ijf eoij oie jojf
#feoijfoiejf   ioj oij       oi jo ij   i joi jo ij oij  ####  oijroijf 3## # o
#foeijfoiej i jo i  iojf 3 ##  #io joi joij oi j## io joi joi j3# 3i ojoi joij
# The stuff I care about

(主题标签是实际文本文件的一部分)

如何将文件指针移动到我关心的内容行,然后如何让python告诉我行的编号,并在那里开始读取文件?

我已经尝试做一个循环来找到最后一个标签所在的行,然后从那里读取,但我仍然需要删除标签,并且需要行号。

2 个答案:

答案 0 :(得分:0)

尝试使用readlines功能。这将返回包含每一行的列表。您可以使用for循环来解析每一行,搜索您需要的内容,然后通过列表中的索引获取该行的编号。例如:

with open('some_file_path.txt') as f:
    contents = f.readlines()
object = '#the line I am looking for'
for line in contents:
    if object in line:
        line_num = contents.index(object)

要摆脱英镑符号,只需使用replace功能即可。例如。 new_line = line.replace('#','')

答案 1 :(得分:0)

在不知道垃圾数据大小或扫描垃圾数据的情况下,您无法直接查找。但是,在你看到" good"之前,将itertools.dropwhile中的文件包装成丢弃行并不是很难。数据,之后它遍历所有剩余的行:

import itertools

# Or def a regular function that returns True until you see the line
# delimiting the beginning of the "good" data
not_good = '# The stuff I care about\n'.__ne__

with open(filename) as f:
    for line in itertools.dropwhile(not_good, f):
        ... You'll iterate the lines at and after the good line ...

如果您确实需要正确定位文件描述符,而不仅仅是行,则此变体应该起作用:

import io

with open(filename) as f:
    # Get first good line
    good_start = next(itertools.dropwhile(not_good, f))

    # Seek back to undo the read of the first good line:
    f.seek(-len(good_start), io.SEEK_CUR)

    # f is now positioned at the beginning of the line that begins the good data

如果你真的需要它,你可以调整它来获得实际的行号(而不仅仅是需要偏移量)。虽然它的可读性稍差,但如果您需要这样做,那么通过enumerate进行显式迭代可能会更有意义(左侧为练习)。让Python为你工作的方法是:

from future_builtins import map  # Py2 only
from operator import itemgetter

with open(filename) as f:
    linectr = itertools.count()
    # Get first good line
    # Pair each line with a 0-up number to advance the count generator, but
    # strip it immediately so not_good only processes lines, not line nums 
    good_start = next(itertools.dropwhile(not_good, map(itemgetter(0), zip(f, linectr))))

    good_lineno = next(linectr) # Keeps the 1-up line number by advancing once

    # Seek back to undo the read of the first good line:
    f.seek(-len(good_start), io.SEEK_CUR)

    # f is now positioned at the beginning of the line that begins the good data