Question

def splicearticles():
    countart = 1
    new_file = "article1.txt"
    with open("newspaperarticles.txt", "r") as my_file:
        with open("temporaryarticle.txt", "a+") as my_temporary:
            for line in my_file:
                if line.strip() != "Reserved by Author":
                    currentline = line.strip() + "\n"
                    my_temporary.write(currentline)
                else:
                    with open(new_file, "w") as my_final:
                        my_final.write(my_temporary.read())
                    countart += 1
                    new_file = "article" + str(countart) + ".txt"
                    my_temporary.truncate(0)

问题似乎在于my_final.write(my_temporary.read())，因为代码的所有其他部分都已执行。谁能让我知道我做错了什么？

Answer 1

                with open(new_file, "w") as my_final:
                    my_final.write(my_temporary.read())

在此处执行my_temporary.read()时，文件位置指向临时文件的末尾。因此read电话不会读取任何内容。尝试先将文件位置返回到文件的开头。

                with open(new_file, "w") as my_final:
                    my_temporary.seek(0)
                    my_final.write(my_temporary.read())

或者，根本不要使用临时文件对象。您可以轻松地将行存储在列表中。

def splicearticles():
    countart = 1
    new_file = "article1.txt"
    with open("newspaperarticles.txt", "r") as my_file:
        temp = []
        for line in my_file:
            if line.strip() != "Reserved by Author":
                currentline = line.strip() + "\n"
                temp.append(currentline)
            else:
                with open(new_file, "w") as my_final:
                    my_final.write("".join(temp))
                countart += 1
                new_file = "article" + str(countart) + ".txt"
                temp = []

Answer 2

如果使用正则表达式，可能会更容易。

假设您拥有代表3篇文章的文件：

Article 1
Blah blah blah blah did blah

Reserved by Author

Article 2
This article goes on and on
end of that article
Reserved by Author

Now we have article 3
blah blah
And it goes on

您可以对文件进行内存映射，使其看起来像一个字符串，而不必将其全部加载到内存中。这允许您对文件内容使用正则表达式并在“作者保留”的分隔符上拆分：

import re
import mmap

fn='/tmp/articles.txt'

with open(fn) as articles:
    mf=mmap.mmap(articles.fileno(), 0, access=mmap.ACCESS_READ)
    chunks=re.finditer(r'(.+?)(?:Reserved by Author\s*\n|\Z)', mf, re.S | re.M)
    for i, block in enumerate(chunks, 1):
        text=block.group(1)
        with open('/tmp/article {}.txt'.format(i), 'w') as fout:
            fout.write(text)

通过这个简单的例子，我们创建了3个新文件：

$ cat "article 1.txt"
Article 1
Blah blah blah blah did blah

$ cat "article 2.txt"
Article 2
This article goes on and on
end of that article

$ cat "article 3.txt"
Now we have article 3
blah blah
And it goes on

到达标记时拼接文本文件

2 个答案: