Question

感谢您对我在我的一项分析中应用的策略的评论/帮助。简而言之，我的情况是：

1) My data have biological origin, collected in a period of 120s, from a
 subject receiving, each time, one of possible three stimuli (response label 1
 to 3), in a random manner, one stimulus per second (trial). Sampling 
 frequency is 256 Hz and 61 different sensors (input variables). So, my 
 dataset has 120x256 rows and 62 columns (1 response label + 61 input 
 variables);
2) My goal is to identify if there is an underlying pattern for each stimulus.
 For that, I would like to use deep learning neural networks to test my
 hypothesis, but not in a conventional way (to predict the stimulus from a
 single observation/row).
3) My approach is to divide the whole dataset, after shuffling per row
 (avoiding any time bias), in training and validation sets (50/50) and then to
 run the deep learning algorithm. The division does not segregate trial events
 (120), so each training/validation sets should contain data (rows) from the
 same trial (but never the same row). If there is a dominant pattern per
 stimulus, the validation confusion matrix error should be low. If there is a
 dominant pattern per trial, the validation confusion matrix error should be
 high. So, the validation confusion matrix error is my indicator of the
 presence of a hidden pattern per stimulus;

如果我能提供有关逻辑有效性的任何意见，我将不胜感激。我想强调的是，我并不是想根据行输入来预测刺激。

感谢。

Answer 1

是的，您可以在交叉验证集中使用分类性能，这超出了在每个类的范例内存在模式或关系的机会。如果在一个单独的，从未见过的测试集中发现类似的性能，那么这个论点会更强。

如果深度神经网络，SVM或任何其他分类器可以比偶然分类更好，则意味着：

关于每个预测班级的培训集样本中有信息（模式）
AND分类器的可学习模式
AND该信息并非针对培训集（没有过度学习）

因此，如果分类性能超过机会，那么上述3个条件都是正确的。如果没有，那么一个或多个条件可能是错误的。训练变量可能不包含任何有助于预测课程的信息。或者选择预测变量，但是它们和类之间的关系对于分类器来说太复杂了。或者分类器过度学习，而CV集的表现处于偶然或更差的状态。

这是一篇论文（开放访问），它使用类似的逻辑来论证fMRI活动包含一个人正在查看的图像的信息：

Natural Scene Categories Revealed in Distributed Patterns of Activity in the Human Brain

注意：根据所使用的分类器（尤其是DNN＆s;但决策树不那么），这只会告诉您如果有模式，它不会告诉您该模式是什么。

异端使用深度学习来寻找隐藏的模式

1 个答案: