Question

我的数据集基本上是3个变量（输入）的矩阵，以及1个变量（目标）的矩阵。每种情况共有50个数据集（基本上50个样本的f（x，y，z）= t）

我只使用GUI完成了ANN培训。从来没有真正使用脚本/代码。

我现在最简单的目标是为每次列车测试运行手动分割数据，所以我可以费力地运行神经网络5次，但我甚至不确定如何手动选择一系列的用于培训的数据集，以及用于测试的数据集。

这里是MATLAB的完整导出脚本。焦点位于代码墙下方。

% Solve an Input-Output Fitting problem with a Neural Network
% Script generated by NFTOOL
% Created Mon Jul 17 02:39:31 SGT 2017
%
% This script assumes these variables are defined:
%
%   DEinp - input data.
%   DEcgl - target data.

inputs = DEinp;
targets = DEcgl;

% Create a Fitting Network
hiddenLayerSize = 10;
net = fitnet(hiddenLayerSize);

% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax'};


% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
net.divideMode = 'sample';  % Divide up every sample
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;

% For help on training function 'trainlm' type: help trainlm
% For a list of all training functions type: help nntrain
net.trainFcn = 'trainlm';  % Levenberg-Marquardt

% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'mse';  % Mean squared error

% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
  'plotregression', 'plotfit'};


% Train the Network
[net,tr] = train(net,inputs,targets);

% Test the Network
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)

% Recalculate Training, Validation and Test Performance
trainTargets = targets .* tr.trainMask{1};
valTargets = targets  .* tr.valMask{1};
testTargets = targets  .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs)
valPerformance = perform(net,valTargets,outputs)
testPerformance = perform(net,testTargets,outputs)

% View the Network
view(net)

% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, plotfit(net,inputs,targets)
%figure, plotregression(targets,outputs)
%figure, ploterrhist(errors)

我认为我需要做的就是搞乱net.divideMode部分，但我真的不知道如何更改语法以完成我的目标。

Answer 1

网络参数

将数据拆分为训练，验证和测试集的过程发生在您确定的部分。我只想打破每一条线。从：

开始

% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
net.divideMode = 'sample';  % Divide up every sample

divideMode在Neural Network Object Properties

中有详细记录

net.divideMode

此属性定义目标数据维度   在调用数据除法函数时进行分割。它的默认值   值是静态网络的“样本”和动态网络的“时间”。   它也可以设置为'sampletime'以通过两个样本划分目标   和时间步，'全部'按每个标量值划分目标，或者   'none'根本不分割数据（在这种情况下使用所有数据）   用于培训，没有用于验证或测试）。

因此，您的网络是一个静态网络，它将每个样本划分为一个训练示例。对于您的交叉验证，这将保持不变。您对操作感兴趣的是训练，测试和验证分割。

net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;

好的，这里的变量名称看起来很有希望，但是你需要更多的控制而不仅仅是选择比率大小。

Neural Network Object Properties再一次向我们指出了更多信息

<强> net.divideParam

此属性定义当前的参数和值   数据分割功能。要了解每个字段的含义，   键入以下命令：

help(net.divideFcn)

这将打印出有关如何将数据集划分为培训，验证和测试拆分的信息。在当前配置中，消息显示为

dividerand 使用随机索引将分区索引分为三组。

[trainInd,valInd,testInd] = dividerand(Q,trainRatio,valRatio,testRatio)需要一些   样本Q并在训练之间划分样本指数1：Q，   验证和测试指数。

dividerand 根据三个比率随机分配三组样本索引。

（...）

另见divideblock，divideind，divideint，dividetrain。

由于您需要更多控制分区，因此您应该查看这些附加选项。

我认为最有希望的是divideind。此选项允许您指定每个分区的索引。您可以计算k折交叉验证中每个折叠的索引，并使用此选项在每次迭代中重新分配分区。

要设置此参数，请将上面的net.divideParam行替换为

net.divideFcn = 'divideind';
net.divideParam.Q = length(targets); %This is the total number of instances in your data 
net.divideParam.trainInd = your_train_ind;
net.divideParam.valInd = your_val_ind;
net.divideParam.testInd = your_test_ind;

K-褶皱

最后一个细节，如何选择指数？首先，快速回顾k-fold交叉验证。

数据被分成k个大小相等的子样本。
在交叉验证的每次迭代中，我们训练子样本的k-1并测试剩余的子样本，每次都转到新的测试子样本。

实施草图可能如下所示

k = 5; % As an example, let's let k = 5
sample_size = length(targets)/k;

%Make a vector of all the indices of your data from 1 to the total number of instances
indices= 1:length(targets); 

% Optional: Randomize samples
indices = randperm(length(targets));

% Iterate in steps of sample_size
for ii = 1: sample_size:length(targets) - sample_size

    % Grab one subsample of indices for testing
    your_test_ind = indices( ii:ii + sample_size - 1);

    % Everything else
    your_train_ind = indices( [1:ii, ii + sample_size:end]);

    %Train and test your network here!
end

这只是一个实现草图，无法正确处理某些边缘情况。例如，第一个元素总是被添加到训练集中，但它应该足以让你开始。

对生成的ANN代码进行K折交叉验证修改？

1 个答案:

网络参数

K-褶皱