scipy.sparse.diags的高效密集副本

时间:2019-04-07 12:25:52

标签: python numpy scipy

scipy.sparse.diags使我可以输入多个对角矢量及其位置,以构建诸如

的矩阵
from scipy.sparse import diags
vec = np.ones((5,))
vec2 = vec + 1
diags([vec, vec2], [-2, 2])

我正在寻找一种高效的方法来完成上述任务,但要构建一个密集的矩阵,而不是DIAnp.diag仅支持单个对角线。从多个对角线向量构建密集矩阵的有效方法是什么?

预期的输出:与np.array(diags([vec, vec2], [-2, 2]).todense())

2 个答案:

答案 0 :(得分:2)

一种方法是使用N+1步骤索引到展平的输出数组:

import numpy as np
from scipy.sparse import diags
from timeit import timeit

def diags_pp(vecs, offs, dtype=float, N=None):
    if N is None:
        N = len(vecs[0]) + abs(offs[0])
    out = np.zeros((N, N), dtype)
    outf = out.reshape(-1)
    for vec, off in zip(vecs, offs):
        if off<0:
            outf[-N*off::N+1] = vec
        else:
            outf[off:N*(N-off):N+1] = vec
    return out

def diags_sp(vecs, offs):
    return diags(vecs, offs).A

for N, k in [(10, 2), (100, 20), (1000, 200)]:
    print(N)
    O = np.arange(-k,k)
    D = [np.arange(1, N+1-abs(o)) for o in O]
    for n, f in list(globals().items()):
        if n.startswith('diags_'):
            print(n.replace('diags_', ''), timeit(lambda: f(D, O), number=10000//N)*N)
            if n != 'diags_sp':
                assert np.all(f(D, O) == diags_sp(D, O))

样品运行:

10
pp 0.06757194991223514
sp 1.9529316504485905
100
pp 0.45834919437766075
sp 4.684177896706387
1000
pp 23.397524026222527
sp 170.66762899048626

答案 1 :(得分:1)

使用Paul Panzer(10,2)的案子

A

对角线有不同的长度。

A将此转换为async def addsourceserver(ctx, name=None, description=None, ip=None, port=None, query_port=None): mydb = msc(ctx.guild.id) mycursor = mydb.cursor(buffered=True) try: # Checking if the database has been made (theres a another command to make the db if name == 0 or description == 0 or ip == 0 or port == 0 or query_port == 0: #some more checking mycursor.execute("select * from source_server_info") existingservers = len(mycursor.fetchall()) # finds the instance number it should use when inserting (primary key kind of) mydb=msc(ctx.guild.id) mycursor = mydb.cursor(buffered=False) mycursor.execute(f"use `{str(ctx.guild.id)}`") mycursor.execute(f"""INSERT INTO source_server_info (instance, name, description, ip, port, query_port) VALUES ({existingservers+1}, '{name}', '{description}', '{ip}', {port}, {query_port})""") mycursor.commit()

In [107]: O                                                                     
Out[107]: array([-2, -1,  0,  1])
In [108]: D                                                                     
Out[108]: 
[array([1, 2, 3, 4, 5, 6, 7, 8]),
 array([1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]),
 array([1, 2, 3, 4, 5, 6, 7, 8, 9])]

这里对角线的参差不齐的列表已转换为填充的2d数组。这可能是指定对角线的便捷方式,但并不是特别有效。对于大多数计算,必须将其转换为sparse.diags格式:

sparse.dia_matrix

使用In [109]: M = sparse.diags(D,O) In [110]: M Out[110]: <10x10 sparse matrix of type '<class 'numpy.float64'>' with 36 stored elements (4 diagonals) in DIAgonal format> In [111]: M.data Out[111]: array([[ 1., 2., 3., 4., 5., 6., 7., 8., 0., 0.], [ 1., 2., 3., 4., 5., 6., 7., 8., 9., 0.], [ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.], [ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]]) 我可以通过迭代构造相同的数组

csr

并具有Paul的功能:

In [112]: timeit sparse.diags(D,O)                                              
99.8 µs ± 3.65 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [113]: timeit sparse.diags(D,O, format='csr')                                
371 µs ± 155 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

np.diag中的关键步骤是简单分配:

np.add.reduce([np.diag(v,k) for v,k in zip(D,O)])

In [117]: timeit np.add.reduce([np.diag(v,k) for v,k in zip(D,O)])              
39.3 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

这基本上与Paul的In [120]: timeit diags_pp(D,O) 12.3 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 分配相同。因此,功能基本上是相同的,即通过切片分配每个对角线。保罗简化了。

创建np.diags数组(res[:n-k].flat[i::n+1] = v )还需要将outf数组复制到2d数组中-但要使用不同的切片。