如何在Julia中同时打印到多个文件?除了以下以外,还有其他更清洁的方法吗?
for f in [open("file1.txt", "w"), open("file2.txt", "w")]
write(f, "content")
close(f)
end
答案 0 :(得分:6)
根据您的问题,我假设您并不是要并行写入(由于该操作可能受IO约束,因此可能不会加快速度)。
您的解决方案有一个小问题-如果f
引发异常,则不能保证write
被关闭。
以下三种方法可以确保即使出现错误也可以关闭文件:
for fname in ["file1.txt", "file2.txt"]
open(fname, "w") do f
write(f, "content")
end
end
for fname in ["file1.txt", "file2.txt"]
open(f -> write(f, "content"), fname, "w")
end
foreach(fn -> open(f -> write(f, "content"), fn, "w"),
["file1.txt", "file2.txt"])
它们给出相同的结果,因此选择只是一个品味问题(您可以派生更多类似实现的变体)。
所有方法均基于open
函数的以下方法:
open(f::Function, args...; kwargs....)
Apply the function f to the result of open(args...; kwargs...)
and close the resulting file descriptor upon completion.
观察到,如果实际上在某个地方引发了异常,则处理仍将终止(只能保证关闭文件描述符)。为了确保实际执行每个写操作,您可以执行以下操作:
for fname in ["file1.txt", "file2.txt"]
try
open(fname, "w") do f
write(f, "content")
end
catch ex
# here decide what should happen on error
# you might want to investigate the value of ex here
end
end
有关try/catch
的文档,请参见https://docs.julialang.org/en/latest/manual/control-flow/#The-try/catch-statement-1。
答案 1 :(得分:3)
如果您真的想并行编写(使用多个进程),可以按照以下步骤进行操作:
using Distributed
addprocs(4) # using, say, 4 examples
function ppwrite()
@sync @distributed for i in 1:10
open("file$(i).txt", "w") do f
write(f, "content")
end
end
end
为便于比较,串行版本为
function swrite()
for i in 1:10
open("file$(i).txt", "w") do f
write(f, "content")
end
end
end
在我的计算机(ssd +四核)上,这导致〜70%的加速:
julia> @btime ppwrite();
3.586 ms (505 allocations: 25.56 KiB)
julia> @btime swrite();
6.066 ms (90 allocations: 6.41 KiB)
但是,请注意,对于实际内容,这些时间安排可能会发生重大变化,可能必须将其转移到不同的流程中。另外,由于IO通常会在某些时候成为瓶颈,因此它们可能无法扩展。
更新:更大的(字符串)内容
julia> using Distributed, Random, BenchmarkTools
julia> addprocs(4);
julia> global const content = [string(rand(1000,1000)) for _ in 1:10];
julia> function ppwrite()
@sync @distributed for i in 1:10
open("file$(i).txt", "w") do f
write(f, content[i])
end
end
end
ppwrite (generic function with 1 method)
julia> function swrite()
for i in 1:10
open("file$(i).txt", "w") do f
write(f, content[i])
end
end
end
swrite (generic function with 1 method)
julia> @btime swrite()
63.024 ms (110 allocations: 6.72 KiB)
julia> @btime ppwrite()
23.464 ms (509 allocations: 25.63 KiB) # ~ 2.7x speedup
使用较大的10000x10000矩阵(3个而不是10个)的字符串表示来执行相同的操作会导致
julia> @time swrite()
7.189072 seconds (23.60 k allocations: 1.208 MiB)
julia> @time swrite()
7.293704 seconds (37 allocations: 2.172 KiB)
julia> @time ppwrite();
16.818494 seconds (2.53 M allocations: 127.230 MiB) # > 2x slowdown of first call
julia> @time ppwrite(); # 30%$ slowdown of second call
9.729389 seconds (556 allocations: 35.453 KiB)
答案 2 :(得分:0)
只需添加一个协程版本,它可以像多进程一样并行执行IO,而且还可以避免数据重复和传输。
julia> using Distributed, Random
julia> global const content = [randstring(10^8) for _ in 1:10];
julia> function swrite()
for i in 1:10
open("file$(i).txt", "w") do f
write(f, content[i])
end
end
end
swrite (generic function with 1 method)
julia> @time swrite()
1.339323 seconds (23.68 k allocations: 1.212 MiB)
julia> @time swrite()
1.876770 seconds (114 allocations: 6.875 KiB)
julia> function awrite()
@sync for i in 1:10
@async open("file$(i).txt", "w") do f
write(f, "content")
end
end
end
awrite (generic function with 1 method)
julia> @time awrite()
0.243275 seconds (155.80 k allocations: 7.465 MiB)
julia> @time awrite()
0.001744 seconds (144 allocations: 14.188 KiB)
julia> addprocs(4)
4-element Array{Int64,1}:
2
3
4
5
julia> function ppwrite()
@sync @distributed for i in 1:10
open("file$(i).txt", "w") do f
write(f, "content")
end
end
end
ppwrite (generic function with 1 method)
julia> @time ppwrite()
1.806847 seconds (2.46 M allocations: 123.896 MiB, 1.74% gc time)
Task (done) @0x00007f23fa2a8010
julia> @time ppwrite()
0.062830 seconds (5.54 k allocations: 289.161 KiB)
Task (done) @0x00007f23f8734010