诈骗。 spams.lasso加权错误输出?

时间:2015-03-13 21:20:22

标签: python optimization sparse-matrix least-squares

向每个人致敬。我无法理解Spams.lassoWeighted函数的输出。如果您在他们的页面上运行该示例 http://spams-devel.gforge.inria.fr/doc-python/html/doc_spams005.html#sec16

import spams
import numpy as np
np.random.seed(0)
print "test lasso weighted"
##############################################
# Decomposition of a large number of signals
##############################################
# data generation
X = np.asfortranarray(np.random.normal(size=(64,10000)))
X = np.asfortranarray(X / np.tile(np.sqrt((X*X).sum(axis=0)),(X.shape[0],1)),dtype= myfloat)
D = np.asfortranarray(np.random.normal(size=(64,256)))
D = np.asfortranarray(D / np.tile(np.sqrt((D*D).sum(axis=0)),(D.shape[0],1)),dtype= myfloat)
param = { 'L' : 20,
    'lambda1' : 0.15, 'numThreads' : 8, 'mode' : spams.PENALTY}
W = np.asfortranarray(np.random.random(size = (D.shape[1],X.shape[1])),dtype= myfloat)
tic = time.time()
alpha = spams.lassoWeighted(X,D,W,**param)
tac = time.time()
t = tac - tic
print "%f signals processed per second\n" %(float(X.shape[1]) / t)

作为输出得到一个64x1的矩阵,它只包含一个非零元素。对于每种情况,每次只给每个信号一个非零元素,这就是相同的。我无法理解为什么在|| x-Dα|| 2 +λ|| diag(w)α|| 1的解。将只有一个非零元素?

1 个答案:

答案 0 :(得分:0)

输出矩阵alpha必须有10000列,因为X是64x10000和256行,因为字典是64x256(因为 Da = x )。所以alpha应该是256x10000。查看Inria Spams文档,LassoWeighted的输出是:

Output:
   A: double sparse p x n matrix (output coefficients)

参数lambda1确定非ll的数量,因为它乘以l1正则化器。它们的实现也有参数L,它是每个稀疏向量的最大非零数。

所以,如果我运行以下内容:

import spams
import numpy as np
import time

np.random.seed(0)
print "test lasso weighted"
X = np.asfortranarray(np.random.normal(size=(64,10000)))
X = np.asfortranarray(X / np.tile(np.sqrt((X*X).sum(axis=0)),(X.shape[0],1)),dtype=float)
D = np.asfortranarray(np.random.normal(size=(64,256)))
D = np.asfortranarray(D / np.tile(np.sqrt((D*D).sum(axis=0)),(D.shape[0],1)),dtype=float)
param = { 'L' : 20,
    'lambda1' : 0.15, 'numThreads' : 8, 'mode' : spams.PENALTY}
W = np.asfortranarray(np.random.random(size = (D.shape[1],X.shape[1])),dtype=float)
tic = time.time()
alpha = spams.lassoWeighted(X,D,W,**param)
tac = time.time()
t = tac - tic
non_zero = []
for col in alpha.T:
    non_zero.append(col.nnz)
print 'Shape Output Matrix:', alpha.shape
print 'Min non-zeros of %d columns: %d'%(alpha.shape[1], np.min(non_zero)) 
print 'Max non-zeros of %d columns: %d'%(alpha.shape[1], np.max(non_zero)) 
print "%f signals processed per second\n" %(float(X.shape[1]) / t)

我明白了:

test lasso weighted
Shape Output Matrix: (256, 10000)
Min non-zeros of 10000 columns: 20
Max non-zeros of 10000 columns: 20
7691.130169 signals processed per second

因此10000个稀疏近似值中的每一个(实际上是256x1向量)都有20个非零值。

如果我们将params更改为(最多5个非零):

param = { 'L' : 5,
    'lambda1' : 0.15, 'numThreads' : 8, 'mode' : spams.PENALTY}

输出:

test lasso weighted
Shape Output Matrix: (256, 10000)
Min non-zeros of 10000 columns: 5
Max non-zeros of 10000 columns: 5
26600.540090 signals processed per second

如果你想要更密集的稀疏近似(alpha列),你可以使L更大或者将它们全部除去:

param = { 'lambda1' : 0.15, 'numThreads' : 8, 'mode' : spams.PENALTY}

输出:

test lasso weighted
Shape Output Matrix: (256, 10000)
Min non-zeros of 10000 columns: 40
Max non-zeros of 10000 columns: 61
1697.975321 signals processed per second