Question

我指的是文档中的示例，用于处理Parallel loops，并尝试使其适应我的用例。在我的情况下，在每个独立迭代中，我得到一个DataFrame，最终需要使用vcat()在所有迭代中进行合并。这是到目前为止我尝试的简化版本：

using DataFrames, Distributed

function test()
    if length(workers()) < length(Sys.cpu_info())
        addprocs(length(Sys.cpu_info()); exeflags="--project=" * Base.active_project())
    end

    nheads = @distributed (vcat) for i = 1:20
        DataFrame(a=[Int(rand(Bool))])
    end
end

但是在运行test()时，出现错误消息：

错误：在工作程序2上：UndefVarError：未定义DataFrame

我需要做些什么来纠正这个问题？

Answer 1

第一行的using DataFrames ...语句仅适用于主“线程”。因此，您的工作线程没有导入所需的库。

要解决此问题，您应该在第一行添加关键字@everywhere。那将要求所有进程导入这些库。

修改

刚刚注意到您在函数中做了addprocs。那我的建议是行不通的。这是一个工作版本：

using Distributed

addprocs(length(Sys.cpu_info()))

@everywhere using DataFrames

function test()
    nheads = @distributed (vcat) for i = 1:20
        DataFrame(a=[Int(rand(Bool))])
    end
end

test()

在Julia中使用DataFrames进行并行处理

1 个答案: