Julia: fast file write on the fly

时间:2016-08-31 12:31:30

标签: performance io julia

I am coding a solver which needs to write to file few numbers at each time step. The time step must be small, thus I need to write the output often..

This picture shows the code profiling. As you can see, the highlighted IO section takes a conspicuous part of the execution time.

The IO is done as

println(out_file, t, " ", v.P[1], " ", v.P[end])

where I want to save the first and last element of the vector P inside the data structure v as well as the value of t.

From the profiling seems that the most of the computational time is taken by the string.jl function (which is not defined by me).

This make me wonder whether there is more efficient way to write iteratively the output to file.

Any suggestion?

Thanks

Additional info

The output file is opened once at the beginning of the execution and left open until the end. I cannot report the entire code as it is very long, but it is something as

out_file = open("file.out", "w")

delta_t = computeDeltaT()
t = 0
while t<T
  P = computeP()

  println(out_file, t, " ", P[1], " ", P[end])

  delta_t = computeDeltaT()
  t += delta_t
end

close(out_file)

I need to write iteratively because the solution develops in time and I do not know how delta_t will change. So I cannot pre-allocate P. Also, it would be a huge matrix, something like millions by 5.

EDIT

@isebarn by printing every 100 steps indeed reduces the execution time. Also I'll try to add a second worker to handle the IO so I will not lose data.

1 个答案:

答案 0 :(得分:2)

通过迭代,你的意思是另一个应用程序/程序必须能够在写入之间读取文件吗?否则你只需打开一次流然后在结束时关闭。

f = open(outfile,"w") # do this once
for i in someloop
    # do something
    write(f, "whatever") # write to stream but not flushed to disk
end
close(f) # now everything is flushed to the disk (i.e. now outfile will have changed)

如果您需要在此过程中访问该文件,那么您可以在每次迭代期间打开/关闭(可能写入比println更快,对其进行分析以进行检查)或者您可以每N次迭代打开/关闭流以平衡2?

修改: 资源: http://docs.julialang.org/en/release-0.4/manual/networking-and-streams/

就像@isebarn所说,将二进制文件写入hdf5也可能更快。但不确定。

在这些场景中,IO也经常是一个限制因素。另一件要尝试的是,如果有一种估计P的方法,你可以预分配然后修剪它吗?