Question

我以前用python编写过一个文件，并且在尝试第二次运行脚本时写了两次相同的内容。

这是我的文件内容：

故事1：短篇小说是一部散文小说，通常可以一次坐着阅读，着眼于一个独立的事件或一系列关联的事件，目的是唤起一种“单一效果”或情绪。有很多例外。词典的定义是“发明的散文叙事短于小说，小说通常只涉及几个字符，目的是达到效果的统一，并且通常着重于创作情绪而不是情节。故事1：短篇小说是一部散文小说，通常可以一口气阅读，并着眼于一个独立的事件或一系列关联的事件，目的是唤起“单一效果”或情绪，但是对此有很多例外。词典定义是“发明的散文叙述”比小说中的小说短，小说通常只涉及几个角色，并追求效果的统一，并且通常专注于创造情绪而不是情节。

我正在像这样使用python Set运算符，但这不适用于我的情况：

uniqlines = set(open('file.txt').readlines())
bar = open('file', 'w').writelines(set(uniqlines))

就我而言，现在有换行符，因此所有内容都可以读取一次。我希望能够在第二次遇到Story1：之后删除内容。我该怎么做？

Answer 1

更新：由于您没有换行符来分割文件，因此最好将文件抓取，适当地分割并写入新文件。简单的解决方案是：

import os, tempfile

with open('file.txt') as f,\
     tempfile.NamedTemporaryFile('w', dir='.', delete=False) as tf:
    # You've got a space only before second copy, so it's a useful partition point
    firstcopy, _, _ f.read().partition(' Story1: ')
    # Write first copy
    tf.write(firstcopy)
# Exiting with block closes temporary file so data is there
# Atomically replace original file with rewritten temporary file
os.replace(tf.name, 'file.txt')

从技术上讲，这对于完全防止实际功率损耗不是完全安全的，因为在发生replace元数据更新之前可能不会将数据写入磁盘。如果您偏执，请将其调整为显式阻止，直到在with阻止之前（在write之后）添加以下两行，以同步数据：

    tf.flush()  # Flushes Python level buffers to OS
    os.fsync(tf.fileno())  # Flush OS kernel buffer out to disk, block until done

对于副本以单独的行开头的情况的旧答案：

找到第二个副本的开始位置，然后截断文件：

seen_story1 = False
with open('file.txt', 'r+') as f:
    while True:
        pos = f.tell() # Record position before next line

        line = f.readline()
        if not line:
            break  # Hit EOF

        if line.startswith('Story1:'):
            if seen_story1:
                # Seen it already, we're in duplicate territory
                f.seek(pos)   # Go back to end of last line
                f.truncate()  # Truncate file
                break         # We're done
            else:
                seen_story1 = True  # Seeing it for the first time

由于您要做的就是从文件末尾删除重复的信息，因此这是安全有效的； truncate在大多数操作系统上应该是原子的，因此尾随的数据可以一次全部释放，而不会出现部分写入损坏等风险。

Answer 2

您可以使用find方法。

# set the word you want to look for
myword = "Story1"

#read the file into a variable called text
with open('file.txt', 'r+') as fin:
    text = fin.read()

#find your word for the first time. This method returns  the lowest index of the substring if it is found.
# That's why we add the length of the word we are looking for.
index_first_time_found = text.find(myword) + len(myword)

# We search again, but now we start looking from the index of our previous result.
index_second_time_found = text.find(myword, index_first_time_found)

# We cut of everything upto the index of our second index.
new_text = text[:index_second_time_found]

print(new_text)

删除双重文件内容

2 个答案: