MLP培训：如何处理未知特征值

时间：2014-06-11 07:44:37

标签： c neural-network fann

假设我们使用一组特征向量训练MLP，使得其中一些向量包含未知值。我该怎么处理？ MLP是否有能力做到这一点？

假设训练向量是：

(1.0, 3.4, unknown, 2.0), (3.1, unknown, 1.2, 0.1), (2.1,3.4,1.2,4.5), ...

我正在使用FANN。

1 个答案:

答案 0 :(得分：0)

缺少数据

您指的是缺少数据问题（Little。和Rubin 1987）。这不是神经网络分类器可以自己处理好的东西。您应该对数据进行预处理，并尝试通过基于已知实例变量值（1）或更高级算法（2）的简单统计估计值来填充缺失数据。

（1）例如：

instance1 = 0, 0, 1, 0, 1
instance2 = 0, 0, 1, 0, 1
instance3 = 1, 1, 1, 0, 0
instanceX = 1, 1, 1, 0, ?

# The statistical approach
We can see that instanceX shares a lot of instance3's features,
thus we will set the unknown variable accoring to instance3's defined value: 0
# The mean
We could calculate the dataset's mean value for this variable and
use the calculated value: 1

（2）EM算法

这是一种更复杂的算法，用于查找丢失数据的近似估计值。阅读算法here的介绍。