我试图从this publication实现DOSNES算法,但是在Python中实现项目。我发现this Matlab Implementation效果很好,但我可能误解了我的代码中的一个或多个步骤(主要是我猜的轴),因为我显然没有达到相同的结果。这是我在Matlab中挣扎的部分:
P(1:n + 1:end) = 0; % set diagonal to zero
P = 0.5 * (P + P'); '% symmetrize P-values
P = max(P ./ sum(P(:)), realmin); % make sure P-values sum to one
const = sum(P(:) .* log(P(:))); % constant in KL divergence
ydata = .0001 * randn(n, no_dims);
y_incs = zeros(size(ydata));
gains = ones(size(ydata));
% Run the iterations
for iter=1:max_iter
% Compute joint probability that point i and j are neighbors
sum_ydata = sum(ydata .^ 2, 2);
num = 1 ./ (1 + bsxfun(@plus, sum_ydata, bsxfun(@plus, sum_ydata', -2 * (ydata * ydata')))); % Student-t distribution
num(1:n+1:end) = 0; % set diagonal to zero
Q = max(num ./ sum(num(:)), realmin); % normalize to get probabilities
% Compute the gradients (faster implementation)
L = (P - Q) .* num;
y_grads = 4 * (diag(sum(L, 1)) - L) * ydata;
% Update the solution
gains = (gains + .2) .* (sign(y_grads) ~= sign(y_incs)) ... % note that the y_grads are actually -y_grads
+ (gains * .8) .* (sign(y_grads) == sign(y_incs));
gains(gains < min_gain) = min_gain;
y_incs = momentum * y_incs - epsilon * (gains .* y_grads);
ydata = ydata + y_incs;
% Spherical projection
ydata = bsxfun(@minus, ydata, mean(ydata, 1));
r_mean = mean(sqrt(sum(ydata.^2,2)),1);
ydata = bsxfun(@times, ydata, r_mean./ sqrt(sum(ydata.^2,2)) );
% Update the momentum if necessary
if iter == mom_switch_iter
momentum = final_momentum;
end
% Print out progress
if ~rem(iter, 10)
cost = const - sum(P(:) .* log(Q(:)));
disp(['Iteration ' num2str(iter) ': error is ' num2str(cost)]);
end
end
这是我的python版本:
no_dims = 3
n = X.shape[0]
min_gain = 0.01
momentum = 0.5
final_momentum = 0.8
epsilon = 500
mom_switch_iter = 250
max_iter = 1000
P[np.diag_indices_from(P)] = 0.
P = ( P + P.T )/2
P = np.max(P / np.sum(P), axis=0)
const = np.sum( P * np.log(P) )
ydata = 1e-4 * np.random.random(size=(n, no_dims))
y_incs = np.zeros(shape=ydata.shape)
gains = np.ones(shape=ydata.shape)
for iter in range(max_iter):
sum_ydata = np.sum(ydata**2, axis = 1)
bsxfun_1 = sum_ydata.T + -2*np.dot(ydata, ydata.T)
bsxfun_2 = sum_ydata + bsxfun_1
num = 1. / ( 1 + bsxfun_2 )
num[np.diag_indices_from(num)] = 0.
Q = np.max(num / np.sum(num), axis=0)
L = (P - Q) * num
t = np.diag( L.sum(axis=0) ) - L
y_grads = 4 * np.dot( t , ydata )
gains = (gains + 0.2) * ( np.sign(y_grads) != np.sign(y_incs) ) \
+ (gains * 0.8) * ( np.sign(y_grads) == np.sign(y_incs) )
# gains[np.where(np.sign(y_grads) != np.sign(y_incs))] += 0.2
# gains[np.where(np.sign(y_grads) == np.sign(y_incs))] *= 0.8
gains = np.clip(gains, a_min = min_gain, a_max = None)
y_incs = momentum * y_incs - epsilon * gains * y_grads
ydata += y_incs
ydata -= ydata.mean(axis=0)
alpha = np.sqrt(np.sum(ydata ** 2, axis=1))
r_mean = np.mean(alpha)
ydata = ydata * (r_mean / alpha).reshape(-1, 1)
if iter == mom_switch_iter:
momentum = final_momentum
if iter % 10 == 0:
cost = const - np.sum( P * np.log(Q) )
print( "Iteration {} : error is {}".format(iter, cost) )
如果您想进行试验,可以下载使用Iris数据集和附加库的存储库here。 test.py 是我使用Iris数据集的测试实现, visu.py 是该论文对MNIST数据集的结果,但限制为1000k随机点。
非常感谢您的支持,
尼古拉斯
这是最终的代码按预期工作:
P[np.diag_indices_from(P)] = 0.
P = ( P + P.T )/2
P = P / np.sum(P)
const = np.sum(xlogy(P, P))
ydata = 1e-4 * np.random.random(size=(n, no_dims))
y_incs = np.zeros(shape=ydata.shape)
gains = np.ones(shape=ydata.shape)
for iter in range(max_iter):
sum_ydata = np.sum(ydata**2, axis = 1)
bsxfun_1 = sum_ydata.T + -2*np.dot(ydata, ydata.T)
bsxfun_2 = sum_ydata + bsxfun_1
num = 1. / ( 1 + bsxfun_2 )
num[np.diag_indices_from(num)] = 0.
Q = num / np.sum(num)
L = (P - Q) * num
t = np.diag( L.sum(axis=0) ) - L
y_grads = 4 * np.dot( t , ydata )
gains = (gains + 0.2) * ( np.sign(y_grads) != np.sign(y_incs) ) \
+ (gains * 0.8) * ( np.sign(y_grads) == np.sign(y_incs) )
gains = np.clip(gains, a_min = min_gain, a_max = None)
y_incs = momentum * y_incs - epsilon * gains * y_grads
ydata += y_incs
ydata -= ydata.mean(axis=0)
alpha = np.sqrt(np.sum(ydata ** 2, axis=1))
r_mean = np.mean(alpha)
ydata = ydata * (r_mean / alpha).reshape(-1, 1)
if iter == mom_switch_iter:
momentum = final_momentum
if iter % 10 == 0:
cost = const - np.sum( xlogy(P, Q) )
print( "Iteration {} : error is {}".format(iter, cost) )
答案 0 :(得分:1)
在开头你似乎在matlab中替换了非减少max
(它有两个参数,因此它将逐个比较它们并返回一个完整大小P
)并减少最大值python(axis=0
将沿此轴减少,这意味着结果将减少一个维度。)
然而,我的建议是完全忽略max
,因为它看起来非常像是一种业余的尝试,只是通过采取p log p
来回避0
被定义的问题。使用L'Hopital规则的限制p->0
可以显示为0
,而当被要求计算NaN
时,计算机将返回0 * log(0)
。
正确的解决方法是使用scipy.special.xlogy
正确处理0
。