我们的想法是使用N
进程编写N
个文件。
要写入的文件的数据来自多个文件,这些文件存储在以列表作为值的字典中,如下所示:
dic = {'file1':['data11.txt', 'data12.txt', ..., 'data1M.txt'],
'file2':['data21.txt', 'data22.txt', ..., 'data2M.txt'],
...
'fileN':['dataN1.txt', 'dataN2.txt', ..., 'dataNM.txt']}
所以file1
是data11 + data12 + ... + data1M
等......
所以我的代码看起来像这样:
jobs = []
for d in dic:
outfile = str(d)+"_merged.txt"
with open(outfile, 'w') as out:
p = multiprocessing.Process(target = merger.merger, args=(dic[d], name, out))
jobs.append(p)
p.start()
out.close()
并且merger.py看起来像这样:
def merger(files, name, outfile):
time.sleep(2)
sys.stdout.write("Merging %n...\n" % name)
# the reason for this step is that all the different files have a header
# but I only need the header from the first file.
with open(files[0], 'r') as infile:
for line in infile:
print "writing to outfile: ", name, line
outfile.write(line)
for f in files[1:]:
with open(f, 'r') as infile:
next(infile) # skip first line
for line in infile:
outfile.write(line)
sys.stdout.write("Done with: %s\n" % name)
我确实看到文件写在应该去的文件夹上,但它是空的。没有头,没什么。我把印刷品放在那里,看看是否一切都正确,但没有任何效果。
帮助!
答案 0 :(得分:2)
由于工作进程与创建它们的主进程并行运行,因此名为out
的文件在工作者可以写入之前关闭。即使您因out.close()
语句删除with
,也会发生这种情况。而是将每个进程传递给文件名,让进程打开并关闭文件。
答案 1 :(得分:2)
问题是您没有关闭子文件中的文件,因此内部缓冲的数据会丢失。您可以将文件打开到子项或将整个事物包装在try / finally块中以确保文件关闭。在父母中打开的一个潜在优势是你可以在那里处理文件错误。我不是说它引人注目,只是一种选择。
def merger(files, name, outfile):
try:
time.sleep(2)
sys.stdout.write("Merging %n...\n" % name)
# the reason for this step is that all the different files have a header
# but I only need the header from the first file.
with open(files[0], 'r') as infile:
for line in infile:
print "writing to outfile: ", name, line
outfile.write(line)
for f in files[1:]:
with open(f, 'r') as infile:
next(infile) # skip first line
for line in infile:
outfile.write(line)
sys.stdout.write("Done with: %s\n" % name)
finally:
outfile.close()
<强>更新强>
关于父/子文件描述符以及子文件中发生的情况,存在一些混淆。如果程序退出时文件仍处于打开状态,则底层C库不会将数据刷新到磁盘。理论上说,正常运行的程序会在退出之前关闭事物。这是一个孩子因为没有关闭文件而丢失数据的例子。
import multiprocessing as mp
import os
import time
if os.path.exists('mytestfile.txt'):
os.remove('mytestfile.txt')
def worker(f, do_close=False):
time.sleep(2)
print('writing')
f.write("this is data")
if do_close:
print("closing")
f.close()
print('without close')
f = open('mytestfile.txt', 'w')
p = mp.Process(target=worker, args=(f, False))
p.start()
f.close()
p.join()
print('file data:', open('mytestfile.txt').read())
print('with close')
os.remove('mytestfile.txt')
f = open('mytestfile.txt', 'w')
p = mp.Process(target=worker, args=(f, True))
p.start()
f.close()
p.join()
print('file data:', open('mytestfile.txt').read())
我在linux上运行它,我得到了
without close
writing
file data:
with close
writing
closing
file data: this is data