我有一个主文件,它是报纸文章的集合,我需要将每篇报纸文章放入自己的文件中。值得庆幸的是,每篇文章的最后一行都是版权声明,因此我写了以下内容以尝试自动完成我想要的内容:
def splicearticles():
countart = 1
new_file = "article1.txt"
with open("newspaperarticles.txt", "r") as my_file:
with open("temporaryarticle.txt", "a+") as my_temporary:
for line in my_file:
if line.strip() != "Reserved by Author":
currentline = line.strip() + "\n"
my_temporary.write(currentline)
else:
with open(new_file, "w") as my_final:
my_final.write(my_temporary.read())
countart += 1
new_file = "article" + str(countart) + ".txt"
my_temporary.truncate(0)
问题似乎在于my_final.write(my_temporary.read())
,因为代码的所有其他部分都已执行。谁能让我知道我做错了什么?
答案 0 :(得分:0)
with open(new_file, "w") as my_final:
my_final.write(my_temporary.read())
在此处执行my_temporary.read()
时,文件位置指向临时文件的末尾。因此read
电话不会读取任何内容。尝试先将文件位置返回到文件的开头。
with open(new_file, "w") as my_final:
my_temporary.seek(0)
my_final.write(my_temporary.read())
或者,根本不要使用临时文件对象。您可以轻松地将行存储在列表中。
def splicearticles():
countart = 1
new_file = "article1.txt"
with open("newspaperarticles.txt", "r") as my_file:
temp = []
for line in my_file:
if line.strip() != "Reserved by Author":
currentline = line.strip() + "\n"
temp.append(currentline)
else:
with open(new_file, "w") as my_final:
my_final.write("".join(temp))
countart += 1
new_file = "article" + str(countart) + ".txt"
temp = []
答案 1 :(得分:0)
如果使用正则表达式,可能会更容易。
假设您拥有代表3篇文章的文件:
Article 1
Blah blah blah blah did blah
Reserved by Author
Article 2
This article goes on and on
end of that article
Reserved by Author
Now we have article 3
blah blah
And it goes on
您可以对文件进行内存映射,使其看起来像一个字符串,而不必将其全部加载到内存中。这允许您对文件内容使用正则表达式并在“作者保留”的分隔符上拆分:
import re
import mmap
fn='/tmp/articles.txt'
with open(fn) as articles:
mf=mmap.mmap(articles.fileno(), 0, access=mmap.ACCESS_READ)
chunks=re.finditer(r'(.+?)(?:Reserved by Author\s*\n|\Z)', mf, re.S | re.M)
for i, block in enumerate(chunks, 1):
text=block.group(1)
with open('/tmp/article {}.txt'.format(i), 'w') as fout:
fout.write(text)
通过这个简单的例子,我们创建了3个新文件:
$ cat "article 1.txt"
Article 1
Blah blah blah blah did blah
$ cat "article 2.txt"
Article 2
This article goes on and on
end of that article
$ cat "article 3.txt"
Now we have article 3
blah blah
And it goes on