最近开始使用Python进行机器学习。
我正在研究一个涉及回归的Python项目来预测一些值。 输入是由 70个特征组成的数据集,它是分类和序数变量的混合。因变量连续。
输入是数据,重要变量的数量。
我有一些问题,如下所述。
1]有没有办法在 Tensorflow 中使用前向选择技术进行特征选择?
2]功能选择是替代吗?
任何帮助将不胜感激!
答案 0 :(得分:2)
我有N个功能(例如N = 70),我想选择K个顶级功能。 (1)我怎样才能在TensorFlow中做到这一点;(2)有哪些替代选择。
我将展示一种方法,使用L1损失的变体将N个特征的数量限制为最多K个。至于功能选择的替代方案,有很多取决于您想要实现的目标。如果你可以走出TensorFlow,那么你可以使用决策树或随机森林,并简单地限制在大多数K特征中使用的叶数。如果您必须使用TensorFlow并且您想要替代最重要的K功能来规范您的权重,您可以使用随机丢失或L2丢失。同样,它实际上取决于您在寻找顶级K功能的替代方案时想要实现的目标。
假设我们的TensorFlow图定义如下
import tensorflow as tf
size = 4
x_in = tf.placeholder( shape=[None,size] , dtype=tf.float32 )
y_in = tf.placeholder( shape=[None] , dtype=tf.float32 )
l1_weight = tf.placeholder( shape=[] , dtype=tf.float32 )
m = tf.Variable( tf.random_uniform( shape=[size,1] , minval=0.1 , maxval=0.9 ) , dtype=tf.float32 )
m = tf.nn.relu(m)
b = tf.Variable([-10], dtype=tf.float32 )
predict = tf.squeeze( tf.nn.xw_plus_b(x_in,m,b) )
l1_loss = tf.reduce_sum(tf.abs(m))
loss = tf.reduce_mean( tf.square( y_in - predict ) ) + l1_loss * l1_weight
optimizer = tf.train.GradientDescentOptimizer(1e-4)
train = optimizer.minimize(loss)
m_all_0 = tf.zeros( [size,1] , dtype=tf.float32 )
zerod_feature_count = tf.reduce_sum( tf.cast( tf.equal( m , m_all_0 ) , dtype=tf.float32 ) )
k = size - zerod_feature_count
让我们定义一些数据并使用此
import numpy as np
data = np.array([[5.1,3.5,1.4,0.2,0],[4.9,3.0,1.4,0.2,0],[4.7,3.2,1.3,0.2,0],[4.6,3.1,1.5,0.2,0],[5.0,3.6,1.4,0.2,0],[5.4,3.9,1.7,0.4,0],[4.6,3.4,1.4,0.3,0],[5.0,3.4,1.5,0.2,0],[4.4,2.9,1.4,0.2,0],[4.9,3.1,1.5,0.1,0],[5.4,3.7,1.5,0.2,0],[4.8,3.4,1.6,0.2,0],[4.8,3.0,1.4,0.1,0],[4.3,3.0,1.1,0.1,0],[5.8,4.0,1.2,0.2,0],[5.7,4.4,1.5,0.4,0],[5.4,3.9,1.3,0.4,0],[5.1,3.5,1.4,0.3,0],[5.7,3.8,1.7,0.3,0],[5.1,3.8,1.5,0.3,0],[5.4,3.4,1.7,0.2,0],[5.1,3.7,1.5,0.4,0],[4.6,3.6,1.0,0.2,0],[5.1,3.3,1.7,0.5,0],[4.8,3.4,1.9,0.2,0],[5.0,3.0,1.6,0.2,0],[5.0,3.4,1.6,0.4,0],[5.2,3.5,1.5,0.2,0],[5.2,3.4,1.4,0.2,0],[4.7,3.2,1.6,0.2,0],[4.8,3.1,1.6,0.2,0],[5.4,3.4,1.5,0.4,0],[5.2,4.1,1.5,0.1,0],[5.5,4.2,1.4,0.2,0],[4.9,3.1,1.5,0.1,0],[5.0,3.2,1.2,0.2,0],[5.5,3.5,1.3,0.2,0],[4.9,3.1,1.5,0.1,0],[4.4,3.0,1.3,0.2,0],[5.1,3.4,1.5,0.2,0],[5.0,3.5,1.3,0.3,0],[4.5,2.3,1.3,0.3,0],[4.4,3.2,1.3,0.2,0],[5.0,3.5,1.6,0.6,0],[5.1,3.8,1.9,0.4,0],[4.8,3.0,1.4,0.3,0],[5.1,3.8,1.6,0.2,0],[4.6,3.2,1.4,0.2,0],[5.3,3.7,1.5,0.2,0],[5.0,3.3,1.4,0.2,0],[7.0,3.2,4.7,1.4,1],[6.4,3.2,4.5,1.5,1],[6.9,3.1,4.9,1.5,1],[5.5,2.3,4.0,1.3,1],[6.5,2.8,4.6,1.5,1],[5.7,2.8,4.5,1.3,1],[6.3,3.3,4.7,1.6,1],[4.9,2.4,3.3,1.0,1],[6.6,2.9,4.6,1.3,1],[5.2,2.7,3.9,1.4,1],[5.0,2.0,3.5,1.0,1],[5.9,3.0,4.2,1.5,1],[6.0,2.2,4.0,1.0,1],[6.1,2.9,4.7,1.4,1],[5.6,2.9,3.6,1.3,1],[6.7,3.1,4.4,1.4,1],[5.6,3.0,4.5,1.5,1],[5.8,2.7,4.1,1.0,1],[6.2,2.2,4.5,1.5,1],[5.6,2.5,3.9,1.1,1],[5.9,3.2,4.8,1.8,1],[6.1,2.8,4.0,1.3,1],[6.3,2.5,4.9,1.5,1],[6.1,2.8,4.7,1.2,1],[6.4,2.9,4.3,1.3,1],[6.6,3.0,4.4,1.4,1],[6.8,2.8,4.8,1.4,1],[6.7,3.0,5.0,1.7,1],[6.0,2.9,4.5,1.5,1],[5.7,2.6,3.5,1.0,1],[5.5,2.4,3.8,1.1,1],[5.5,2.4,3.7,1.0,1],[5.8,2.7,3.9,1.2,1],[6.0,2.7,5.1,1.6,1],[5.4,3.0,4.5,1.5,1],[6.0,3.4,4.5,1.6,1],[6.7,3.1,4.7,1.5,1],[6.3,2.3,4.4,1.3,1],[5.6,3.0,4.1,1.3,1],[5.5,2.5,4.0,1.3,1],[5.5,2.6,4.4,1.2,1],[6.1,3.0,4.6,1.4,1],[5.8,2.6,4.0,1.2,1],[5.0,2.3,3.3,1.0,1],[5.6,2.7,4.2,1.3,1],[5.7,3.0,4.2,1.2,1],[5.7,2.9,4.2,1.3,1],[6.2,2.9,4.3,1.3,1],[5.1,2.5,3.0,1.1,1],[5.7,2.8,4.1,1.3,1]])
x = data[:,0:4]
y = data[:,-1]
sess = tf.Session()
sess.run(tf.global_variables_initializer())
让我们定义一个可重复使用的试验函数,它可以测试l1_loss的不同权重
def trial(weight=1):
print "initial m,loss",sess.run([m,loss],feed_dict={x_in:x, y_in:y, l1_weight:0})
for _ in range(10000):
sess.run(train,feed_dict={x_in:x, y_in:y, l1_weight:0})
print "after training m,loss",sess.run([m,loss],feed_dict={x_in:x, y_in:y, l1_weight:0})
for _ in range(10000):
sess.run(train,feed_dict={x_in:x, y_in:y, l1_weight:weight})
if sess.run(k) <= 3 :
break
print "after l1 loss m",sess.run([m,loss],feed_dict={x_in:x, y_in:y, l1_weight:weight})
然后我们会尝试一下
print "The number of non-zero parameters is",sess.run(k)
print "Doing a training session"
trial()
print "The number of non-zero parameters is",sess.run(k)
结果看起来不错
The number of non-zero parameters is 4.0
Doing a training session
initial m,loss [array([[0.75030506],
[0.4959089 ],
[0.646675 ],
[0.44027993]], dtype=float32), 8.228347]
after training m,loss [array([[1.1898338 ],
[1.0033115 ],
[0.15164669],
[0.16414128]], dtype=float32), 0.68957466]
after l1 loss m [array([[1.250356 ],
[0.92532456],
[0.10235767],
[0. ]], dtype=float32), 2.9621665]
The number of non-zero parameters is 3.0