我正在尝试将FCN培训从BrainScript转移到C ++程序。首先,我只是加载并重新训练现有模型。我正在某个地方,但是培训师 - > TrainMinibatch()正在抛出异常(我无法弄清楚如何获得异常的描述)。下面的粗略代码:
CNTK::DeviceDescriptor& device= CNTK::DeviceDescriptor::GPUDevice(gpuid);
FunctionPtr rootFunc = nullptr;
try {
rootFunc = Function::Load(modelname, device);
}
catch (char *err) {
printf("Load fail: %s\n",err);
return;
}
catch (...) {
printf("Load fail\n");
return;
}
std::cerr << "Loaded model ok" << std::endl;
MinibatchSourcePtr minibatchSource;
try {
minibatchSource = HG_CreateMinibatchSource(64);
}
catch (char* err) {
std::cerr << "Failed to init src: " << err << std::endl;
return;
}
catch (...) {
std::cerr << "Failed to init src " << std::endl;
return;
}
auto imageStreamInfo = minibatchSource->StreamInfo(L"features");
auto labelStreamInfo = minibatchSource->StreamInfo(L"labels"); // We don't use labels as is FCN
auto inputImageShape = imageStreamInfo.m_sampleLayout;
std::cerr << "Input Shape: " << inputImageShape.AsString() << std::endl;
auto imageInputName = L"features";
auto imageInput = InputVariable(inputImageShape, imageStreamInfo.m_elementType, imageInputName);
auto classifierOutput = rootFunc;
//EITHER - construct error from output+target
std::wstring outputLayerName = L"op";
FunctionPtr outputLayer = rootFunc->FindByName(outputLayerName);
std::wstring targetLayerName = L"opool3";
FunctionPtr targetLayer = rootFunc->FindByName(targetLayerName);
// OR - just get from network
std::wstring errLayerName = L"e";
FunctionPtr errLayer = rootFunc->FindByName(errLayerName);
std::cerr << "Setup-got op layer" << outputLayer->Output().Shape().AsString() << std::endl;
std::cerr << "Setup-got tgt layer" << targetLayer->Output().Shape().AsString() << std::endl;
std::cerr << "Setup-got err layer" << errLayer->Output().Shape().AsString() << std::endl;
auto trainingLoss = CNTK::SquaredError(outputLayer, targetLayer);
auto prediction = CNTK::SquaredError(outputLayer, targetLayer);
LearningRateSchedule learningRatePerSample = TrainingParameterPerSampleSchedule(5e-8);
// Either
auto trainer = CreateTrainer(classifierOutput, trainingLoss->Output(), prediction->Output(), { SGDLearner(classifierOutput->Parameters(), learningRatePerSample) });
// Or
//auto trainer = CreateTrainer(classifierOutput, errLayer, errLayer, { SGDLearner(classifierOutput->Parameters(), learningRatePerSample) });
const size_t minibatchSize = 1;
size_t numMinibatchesToTrain = 100;
size_t outputFrequencyInMinibatches = 10;
try {
for (size_t i = 0; i < numMinibatchesToTrain; ++i)
{
std::cerr << "Iteration: " << i << std::endl;
auto minibatchData = minibatchSource->GetNextMinibatch(minibatchSize, device);
std::cerr << " got data for "<< imageInput.AsString() << std::endl;
trainer->TrainMinibatch({ { imageInput, minibatchData[imageStreamInfo] } }, device); // This line throws exception!
std::cerr << "Eval=" << trainer->PreviousMinibatchEvaluationAverage() << "," << trainer->PreviousMinibatchLossAverage() << std::endl;
}
}
// Question edited as result of comment on exceptions below
catch (const std::exception & err) {
std::cerr << "Training error:" << err.what() << std::endl;
}
catch (...) {
std::cerr << "Training error" << std::endl;
}
目前尚不清楚如何定义损失函数(我在这里猜测 - 实际上没有文档)。网络有CNTK.exe / Brainscript使用的丢失('e'),这是输出('op')和目标('opool3')之间的平方错误。我尝试直接使用e,并使用CNTK :: SquaredError()在C ++中定义错误。两者都给出相同的输出,表示由training-&gt; TrainMinibatch抛出的异常:
Loaded model ok
Input Shape:B[1024 x 1024 x 3]
Setup-got op layeB[63 x 127 x 3]
Setup-got tgt layeB[63 x 127 x 3]
Setup-got err layeB[]
Iteration: 0
got data forB,Input('features', [1024 x 1024 x 3], [*, #])
Training error:Values for 1 required arguments 'Input('features', [1024 x 1024 x 3], [, #])', that the requested output(s) 'Output('aggregateLoss', [], []), Output('Block233_Output_0', [], [, #]), Output('aggregateEvalMetric', [], [])' depend on, have not been provided.
我在这里做错了什么?
谢谢!
d
编辑:例外是:
训练错误:1个必需参数的值'输入('features',[1024 x 1024 x 3],[,#])',请求的输出'输出('aggregateLoss',[], []),输出('Block233_Output_0',[],[,#]),输出('aggregateEvalMetric',[],[])'取决于,尚未提供。
更新:看过cntk代码(CompositeFunction.cpp)后,问题似乎是输入和所需输入之间的不匹配:
提供的变量:输入('features',[1024 x 1024 x 3],[*,#])
必需参数:输入('features',[1024 x 1024 x 3],[,#])
区别在于[*。 #] vs [,#]
不确定如何修复它!
答案 0 :(得分:0)
这个问题是因为imageInput
是一个与网络参数无关的新变量。相反,您需要获取与网络参数关联的输入变量,并将这些变量绑定到minibatchData,例如
std::unordered_map<Variable, ValuePtr> inputDataMap = { { classifierOutput.Arguments()[0], minibatchData[imageStreamInfo] } }
然后将inputDataMap
传递给TrainMinibatch。另请参阅this evaluation example(培训和评估具有非常相似的API)