为什么我的随机SVD实施会使用这么多的内存?

时间:2018-11-05 21:05:56

标签: performance julia

我从本文Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions中获得了随机SVD的Julia实现(如下)。如果您有好奇心,请参阅第9页的算法。

对于大型数据集,我希望随机化的SVD比SVD更有效率,但是它稍微慢一点,并且使用 way 更大的内存。这是我来自@time的效果统计信息:

 SVD:  16.331761 seconds (17 allocations: 763.184 MiB, 0.82% gc time)
 RSVD: 17.009699 seconds (38 allocations: 1.074 GiB, 0.83% gc time)

请注意,我的随机SVD使用了1 GB以上的内存。我不知道为什么。这是我的实现:

using Distributions
using LinearAlgebra

# ------------------------------------------------------------------------------

function find_Q(A, l)
    #=
    Given an m × n matrix A, and an integer l, compute an m × l orthonormal
    matrix Q whose range approximates the range of A.
    =#
    m, n = size(A)
    Ω = rand(Normal(), n, l)
    Y = A * Ω
    Q, R = qr(Y)
    return Q
end

# ------------------------------------------------------------------------------

function randomized_SVD(A, k)
    #=
    Given an m × n matrix A, a target number k of singular vectors, and an
    exponent q (say q = 1 or q = 2), this procedure computes an approximate
    rank-2k factorization UΣVt, where U and V are orthonormal and Σ is
    nonnegative and diagonal.
    =#
    Q = find_Q(A, 2*k)
    B = Q' * A
    S, Σ, Vt = svd(B)
    U = Q * S
    return U, Σ, Vt
end

# ------------------------------------------------------------------------------

m = 2000
n = 20000
k = 10
# Construct low-rank matrix
A = rand(m, k) * rand(k, n)
println("Rank of A: ", rank(A))
println("Size of A: ", size(A))

println("Throwaway test:")
@time svd(A)
@time randomized_SVD(A, k)
println("Actual test:")
@time svd(A)
@time randomized_SVD(A, k)

println("Completed")

请注意,我每Julia documentation打两次@time,即:

  

在第一次调用(@time sum_global())时,将对函数进行编译。 (如果您尚未在此会话中使用@time,它还将编译计时所需的函数。)您不应认真考虑此运行的结果。

1 个答案:

答案 0 :(得分:3)

作为附加说明:Julia文档不建议使用@time宏进行基准测试。最好使用BenchmarkTools.jl包中的@benchmark宏