在HPC群集上运行Matlab批处理作业

时间:2014-05-20 10:26:39

标签: matlab batch-processing parfor

我试图让Matlab作为单独的批处理作业执行许多脚本。每个脚本从excel表加载一些数据并实现神经网络。神经网络在内部使用parfor循环进行参数调整。

当我在本地计算机上运行批处理作业时,它工作正常。我的Matlab代码看起来像

job1 = batch('Historical1Step',...
'Profile', 'local',...
'Matlabpool', 3,...
'CaptureDiary',true,...
'CurrentDirectory', '.');

try
    job1Results = fetchOutputs(job1);
catch err
    delete(job1);
    rethrow(err);
end
delete(job1);

我得到的日记输出是

--- Start Diary ---
Analysing data for stock BAX

num_its =

 2

100%[============================
100%[===================================================]

--- End Diary ---

然而,当我改变本地'配置到我的服务器配置我

--- Start Diary ---
--- End Diary ---
Error using parallel.Job/fetchOutputs (line 869)
An error occurred during execution of Task with ID 1.

Error in SOExample (line 14)
    job1Results = fetchOutputs(job1);

Caused by:
    Index exceeds matrix dimensions.

我假设问题与我的职能/数据在工作人员上的可见性有关,但我已经尝试了' FileDependencies'的所有组合。和' PathDependencies'我可以在批处理功能中想到的选项无济于事。

任何帮助都会非常感激,如果我在没有意识到的情况下做了一些非常愚蠢的事情,请提前道歉!

编辑 -

错误堆栈如下:

Index exceeds matrix dimensions.

Error in Historical1Step (line 13)


Error in parallel.internal.cluster/executeScript (line 22)
eval(['iClearAndSetCallerWorkspace(workspaceIn);' scriptName]);

Error in parallel.internal.evaluator/evaluateWithNoErrors (line 14)
        [out{1:nOut}] = feval(fcn, args{:});

Error in parallel.internal.evaluator/CJSStreamingEvaluator/evaluate (line 31)
            [out, errOut] = parallel.internal.evaluator.evaluateWithNoErrors( fcn, nOut, args );

Error in dctEvaluateTask>iEvaluateTask/nEvaluateTask (line 281)
        [output, errOutput, cellTextOutput{end+1}] = evaluator.evaluate(fcn, nOut, args);

Error in dctEvaluateTask>iEvaluateTask (line 141)
    nEvaluateTask();

Error in dctEvaluateTask (line 57)
    [resultsFcn, taskPostFcn, taskEvaluatedOK] = iEvaluateTask(job, task, runprop);

Error in distcomp_evaluate_filetask_core>iDoTask (line 149)
dctEvaluateTask(postFcns, finishFcn);

Error in distcomp_evaluate_filetask_core (line 48)
iDoTask(handlers, postFcns);


Error using parallel.Job/fetchOutputs (line 869)
An error occurred during execution of Task with ID 1.

Error in SOExample (line 14)
    job1Results = fetchOutputs(job1);

Caused by:
    Index exceeds matrix dimensions.

文件' Historical1Step'是我试图运行的脚本。第一行(直到代码崩溃)是:

wrkDir = 'V:\Individual\SOFNN'; % this is where the files are on cluster headnode
wrkFldr = [wrkDir '\ExcelSheets\1-stepAhead\']; % location of excel sheets

%%
folder = dir(wrkFldr);
isub = [folder(:).isdir]; % data is stored in sub-directory based on stock symbol
stockNames = {folder(isub).name}'; % extract stock names from names of sub-dirs
stockNames(ismember(stockNames,{'.','..'})) = []; % remove names '.' and '..'

for i = 1:1 % this loop should read in data for stock i from correct sub-dir
    close all;
    clc;
    sym = stockNames{i};
    disp(['Analysing data for stock ' sym]);
    fldrName = strcat(wrkFldr,'\', sym, '\');
end % added for completion

1 个答案:

答案 0 :(得分:1)

在您的代码中,您正在使用工作人员的映射驱动器号。通常,由于进程的启动方式,工作人员无法看到映射驱动器号。请尝试使用UNC路径。这里的文档中有更多信息:http://www.mathworks.com/help/distcomp/troubleshooting-and-debugging.html