Question

我正在尝试根据条件打印第-4行。我有一个包含一些内容的文本文件SFU.txt。我的目标是：如果一行中有一个单词configuration，我想打印第-4行。例如，如果我的文件内容如下所示：

This is a random text document
We are talking about planets here
This is planet Mars
in solarsystem
sun is the star
this is 4th planet
configuration lifeform exists
bla bla bla
bla bla bla

因此，一旦编译器点击configuration lifeform exists行并看到configuration，我就要打印行This is planet earth

我的代码如下：

file = open("SFU.txt","r")
for line in file:
    if "configuration" in line:
        #want to print the -4th line-HOW?

Answer 1

有限大小的deque是保留最后几行“环形缓冲区”的好方法：

import collections

lastfewlines = collections.deque((), 4)

with open('SFU.txt') as f:
    for line in f:
        if 'configuration' in line and len(lastfewlines) == 4:
            print(lastfewlines[0])
        lastfewlines.append(line.rstrip())

然而，虽然这解决了问题中提出的问题，但不为OP中仅在评论中提到的“真正问题”工作 - “编辑”该行，意思是，大概是，“在适当的位置”改变输入文件。

唉，现代文件系统不允许文件“就地编辑”，除了逐字节覆盖 - 除非“编辑”行与字节数完全相同原来的，你不能只是覆盖所说的原始行，并想象文件中的所有以下行都会根据需要来回移动！ - ）

相反，必须读取文件，更改文件，然后重写（最有效的方法通常是编写新文件，然后将其重命名为旧文件名“系统和文件系统将让你“，以避免在发生崩溃时丢失数据。”

deque方法可以适应这种情况 - 而不仅仅是有条件地打印lastfewlines[0]，将输出文件的原始版本或修改版本写入输出文件（最后写下剩下的内容） deque到输出文件）。然后，至少在Unix系统和本地文件系统上，一个简单的os.rename将执行原子技巧（只要输出文件与输入文件在同一个安装的磁盘上）。

但是，对于所有但非常大的文件，读取内存中的所有行（使用f.readlines()），在行列表中执行更改（如果有的话），然后再次将批次写出来要简单得多。并且由于用户提到16,000行（长度未指定但假设每个平均行少于100个字节），这个小于2兆字节的小文件应该以最简单的方式处理 - 它比任何文件小几个数量级这将导致任何“太大而不适合记忆”的担忧！ - ）

Answer 2

使用tee在inf之间运行一对迭代器。这在任何给定时间只在内存中存储五行：

from itertools import tee

with open("SFU.txt") as inf:
    # set up iterators
    cfg,res = tee(inf)
    # advance cfg by four lines
    for i in range(4):
        next(cfg)

    for c,r in zip(cfg, res):
        if "configuration" in c:
            print(r)

并且正如预期的那样，结果

This is planet Mars

编辑如果你想编辑第4行，我建议

def edited(r):
    # make your changes to r
    return new_r

with open("SFU.txt") as inf, open("edited.txt", "w") as outf:
    # set up iterators
    cfg, res = tee(inf)
    for i in range(4):
        next(cfg)

    # iterate through in tandem
    for c, r in zip(cfg, res):
        if "configuration" in c:
            r = edited(r)
        outf.write(r)

    # reached end - write out remaining queued values
    for r in res:
        outf.write(r)

Answer 3

如果您有几行，可以使用readlines()将行保存为列表，然后只使用索引：

my_file = open("SFU.txt","r").readlines()
for i,line in enumerate(my_file):
    if "configuration" in line:
        print file[i-4]

但请注意，如果i<4它从头开始选择你的行！

Answer 4

如果你有一个更长的文件并且不想将整个内容读入内存，你可以使用有效的队列实现，例如collections.deque，如：

import collections

myfile = open("SFU.txt","r")

# This is a fixed length queue, and will hold 4 items at most
lines = collections.deque(['']*4,4)

for i, line in enumerate(myfile):
    if 'configuration' in line:
        print lines[0]
    else:
        # push the new line clearing the 4th previous
        lines.append(line)

Answer 5

也许尝试这样的事情。

当整个事物被复制到列表中时，所有文本都是可编辑的。完成后，您可以将其写回文件。

f = open("SFU.txt","r")

lines = [line.strip() for line in f]
for i, line in enumerate(lines):
    if "configuration" in line:
        if i > 4:
            print lines[i - 4]
            # edit here
        else:
            print 'There is no -4th line'

f.close()

Answer 6

或者，您可以打开文件两次并生成一个文件以从第4行读取，然后首先比较下一行，然后打印当前行，如下所示：

with open('SFU.txt', 'r') as f:
    with open('SFU.txt', 'r') as next_f:
        [next(next_f) for _ in range(4)] # yield to 4th line first
        for line in next_f:
            if 'configuration' in line: # if keyword in next line
                print next(f) # this is current line from f
                break
            next(f) # if not found, yield f to next line

收益率结果：

This is planet Mars

作为旁注：请尽量不要使用file作为命名空间，因为它是Python内置的影子名称。

根据python中的条件打印上一行

6 个答案: