Question

我正在处理一个需要在Julia中按比例排序的部分排列的问题。如果x是维度p的向量，那么我需要的是与k的{{1}}组件相对应的第一个k索引按绝对值x进行部分排序。

请参阅Julia的排序函数here。基本上，我想在x和sortperm之间进行交叉。当Julia 0.4发布时，我可以通过将select!（this function）应用于索引向量并选择它们的第一个sortperm!来获得相同的答案。但是，使用k并不理想，因为它会对sortperm!的其余p-k索引进行排序，这是我不需要的。

进行部分置换排序的内存效率最高的方法是什么？我通过查看x源代码来破解解决方案。但是，由于我不熟悉Julia在那里使用的订购模块，我不确定我的方法是否聪明。

一个重要的细节：我可以忽略这里的重复或含糊之处。换句话说，我不关心sortperm索引对两个组件abs()和2的排序。我的实际代码使用浮点值，因此实际上不会出现精确的相等性。

-2

编辑：使用建议的代码，我们可以简要比较更大的向量的性能：

# initialize a vector for testing
x  = [-3,-2,4,1,0,-1]
x2 = copy(x)
k  = 3    # num components desired in partial sort
p  = 6    # num components in x, x2

# what are the indices that sort x by magnitude?
indices = sortperm(x, by = abs, rev = true)

# now perform partial sort on x2
select!(x2, k, by = abs, rev = true)

# check if first k components are sorted here
# should evaluate to "true"
isequal(x2[1:k], x[indices[1:k]])

# now try my partial permutation sort
# I only need indices2[1:k] at end of day!
indices2 = [1:p]
select!(indices2, 1:k, 1, p, Base.Perm(Base.ord(isless, abs, true, Base.Forward), x))

# same result? should evaluate to "true"
isequal(indices2[1:k], indices[1:k])

我的输出：

p = 10000; k = 100;    # asking for largest 1% of components
x  = randn(p); x2 = copy(x);

# run following code twice for proper timing results
@time {indices = sortperm(x, by = abs, rev = true); indices[1:k]};
@time {indices2 = [1:p]; select!(indices2, 1:k, 1, p, Base.Perm(Base.ord(isless, abs, true, Base.Forward), x))};
@time selectperm(x,k);

Answer 1

以下版本似乎相对节省空间，因为它只使用与输入数组长度相同的整数数组：

function selectperm (x,k)
    if k > 1 then
        kk = 1:k
    else
        kk = 1
    end
    z = collect(1:length(x))
    return select!(z,1:k,by = (i)->abs(x[i]), rev = true)
end    
x  = [-3,-2,4,1,0,-1]

k  = 3    # num components desired in partial sort
print (selectperm(x,k))

输出结果为：

[3,1,2]

......正如所料。

我不确定它是否比最初提出的解决方案使用更少的内存（虽然我怀疑内存使用情况类似）但代码可能更清晰，而只生成第一个{ {1}}索引，而原始解决方案产生所有k索引。

（编辑）

已修改

p以处理selectperm()调用BoundsError时发生的k=1。

Julia中的高效部分置换排序

1 个答案: