Question

我有一个巨大的trainData，我想从它中取出随机子集（让我们说1000次）并使用它们连续训练神经网络对象。是否可以使用Neuralnet R包。我在想的是：

library(neuralnet)

for (i=1:1000){
classA <- 2000 
classB <- 2000 
dataB <- trainData[sample(which(trainData$class == "B"), classB, replace=TRUE),] #withdraw 2000 samples from class B
dataU <- trainData[sample(which(trainData$class == "A"), classA, replace=TRUE),] #withdraw 2000 samples from class A
subset <- rbind(dataB, dataU) #bind them to make a subset

然后提供实际trainData的这个子集，一次又一次地训练神经网络对象，如：

nn <- neuralnet(formula, data=subset, hidden=c(3,5), linear.output = F, stepmax = 2147483647) #use that subset for training the neural network
}

我的问题是这个名为nn的neualnet对象是否会在循环的每次迭代中被训练，当循环完成后我将获得一个完全训练的神经网络对象？其次，当神经网络无法为特定子集收敛时，非收敛的影响是什么？它会影响预测结果吗？

Answer 1

答案最短 - 否

更细致的答案 - 排序......

为什么呢？ - 如果在neuralnet::neuralnet内未达到weights，则threshold函数不会返回stepmax。但是，如果达到threshold，则生成的对象将包含最终weights。然后可以将这些weights作为neuralnet参数提供给startweights函数，以允许连续学习。您的电话将如下所示：

# nn.prior = previously run neuralnet object

nn <- neuralnet(formula, data=subset, hidden=c(3,5), linear.output = F, stepmax = 2147483647, startweights = nn.prior$weights)

但是，我最初回答“不”。因为选择threshold从子集中获取适当数量的信息，同时确保其收敛＆＃39;在stepmax之前可能是一个猜谜游戏，而不是非常客观。

我基本上有四种选择：

找到另一个允许明确
获取neuralnet源代码并修改它以返回weights即使收敛＆＃39;没有实现（即达到threshold）。
选择一个合适大小的随机子集，然后在其上构建模型并测试其模型。性能。（这实际上是AFAIK很常见的做法）。
获取所有子集，在每个子集上构建一个模型，然后将它们组合成一个＆＃39;整体＆＃39;模型。

Answer 2

我建议使用k-fold验证来训练许多使用库（e1071）和调整功能的网络。

神经网络的连续训练

2 个答案: