我有一个稀疏对称矩阵列表sigma
,这样
len(sigma) = N
以及所有i,j,k
,
sigma[i].shape[0] == sigma[i].shape[1] = m # Square
sigma[i][j,k] == sigma[i][k,j] # Symmetric
我有一个索引数组P
,以便
P.shape[0] = N
P.shape[1] = k
我的目标是使用k x k
给出的索引来提取sigma[i]
P[i,:]
密集子矩阵。这可以按如下方式完成
sub_matrices = np.empty([N,k,k])
for i in range(N):
sub_matrices[i,:,:] = sigma[i][np.ix_(P[i,:], P[i,:])].todense()
但请注意,虽然k
很小,但N
(和m
)非常大。如果稀疏对称矩阵以CSR格式存储,则需要很长时间。我觉得必须有一个更好的解决方案。例如,是否存在稀疏格式,适用于需要在两个维度上切片的对称矩阵?
我正在使用Python,但对任何可以使用Cython进行交互的C库建议都是开放的。
EXTRA
请注意,我目前的Cython方法如下:
cimport cython
import numpy as np
cimport numpy as np
@cython.boundscheck(False) # turn off bounds-checking for entire function
cpdef sparse_slice_fast_cy(sigma,
long[:,:] P,
double[:,:,:] sub_matrices):
"""
Inputs:
sigma: A list (N,) of sparse sp.csr_matrix (m x m)
P: A 2D array of integers (N, k)
sub_matrices: A 3D array of doubles (N, k, k) containing the slicing
"""
# Create variables for keeping code tidy
cdef long N = P.shape[0]
cdef long k = P.shape[1]
cdef long i
cdef long j
cdef long index_pointer
cdef long sparse_row_pointer
# Create objects for holding sparse matrix data
cdef double[:] data
cdef long[:] indices
cdef long[:] indptr
# Object for the ordered P
cdef long[:] perm
# Make sure sub_matrices is all 0
sub_matrices[:] = 0
for i in range(N):
# Sort the P
perm = np.argsort(P[i,:])
# Get the sparse matrix values
data = sigma[i].data
indices = sigma[i].indices.astype(long)
indptr = sigma[i].indptr.astype(long)
for j in range(k):
# Loop over row P[i, perm[j]] in sigma searching for values
# in P[i, :] vector i.e. compare
# sigma[P[i, perm[j], :]
# against
# P[i,:]
# To do this we need our sparse row vector with columns
# indices[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
# and data/values
# data[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
# which comes from the csr matrix format.
# We also need our sorted indexing vector
# P[i, perm[:]]
# We begin by pointing at the top of both
# our vectors and gradually move down them. In the event of
# an equality we add the data to sub_matrices[i,:,:] and
# increment the INDEXING VECTOR pointer, not the sparse
# row vector pointer, as there can be multiple values that
# are the same in the indexing vector but not the sparse row
# column vector (only 1 column can appear in 1 row!).
index_pointer = 0
sparse_row_pointer = indptr[P[i, perm[j]]]
while ((index_pointer < k) and (sparse_row_pointer < indptr[P[i, perm[j]] + 1])):
if indices[sparse_row_pointer] == P[i, perm[index_pointer]]:
# We can add data to sub_matrices
sub_matrices[i, perm[j], perm[index_pointer]] = \
data[sparse_row_pointer]
# Only increment the index pointer
index_pointer += 1
elif indices[sparse_row_pointer] > P[i, perm[index_pointer]]:
# Need to increment index pointer
index_pointer += 1
else:
# Need to increment sparse row pointer
sparse_row_pointer += 1
我相信当np.argsort
经常在相对较小的向量上调用并且想要交换C实现时,N
可能效率低下。我也没有利用可能在prange
稀疏矩阵上加速的并行处理。不幸的是,因为外部循环中存在Python强制,我不知道如何使用cimport cython
import numpy as np
cimport numpy as np
@cython.boundscheck(False) # turn off bounds-checking for entire function
cpdef sparse_slice_fast_cy(sigma,
np.ndarray[np.int32_t, ndim=2] P,
np.float64_t[:,:,:] sub_matrices,
int symmetric):
"""
Inputs:
sigma: A list (N,) of sparse sp.csr_matrix (m x m)
P: A 2D array of integers (N, k)
sub_matrices: A 3D array of doubles (N, k, k) containing the slicing
symmetric: 1 if the sigma matrices are symmetric
"""
# Create variables for keeping code tidy
cdef np.int32_t N = P.shape[0]
cdef np.int32_t k = P.shape[1]
cdef np.int32_t i
cdef np.int32_t j
cdef np.int32_t index_pointer
cdef np.int32_t sparse_row_pointer
# Create objects for holding sparse matrix data
cdef np.float64_t[:] data
cdef np.int32_t[:] indices
cdef np.int32_t[:] indptr
# Object for the ordered P
cdef np.int32_t[:,:] perm = np.argsort(P, axis=1).astype(np.int32)
# Make sure sub_matrices is all 0
sub_matrices[:] = 0
for i in range(N):
# Get the sparse matrix values
data = sigma[i].data
indices = sigma[i].indices
indptr = sigma[i].indptr
for j in range(k):
# Loop over row P[i, perm[j]] in sigma searching for values
# in P[i, :] vector i.e. compare
# sigma[P[i, perm[j], :]
# against
# P[i,:]
# To do this we need our sparse row vector with columns
# indices[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
# and data/values
# data[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
# which comes from the csr matrix format.
# We also need our sorted indexing vector
# P[i, perm[:]]
# We begin by pointing at the top of both
# our vectors and gradually move down them. In the event of
# an equality we add the data to sub_matrices[i,:,:] and
# increment the INDEXING VECTOR pointer, not the sparse
# row vector pointer, as there can be multiple values that
# are the same in the indexing vector but not the sparse row
# column vector (only 1 column can appear in 1 row!).
if symmetric:
index_pointer = j # Only search upper triangular
else:
index_pointer = 0
sparse_row_pointer = indptr[P[i, perm[i, j]]]
while ((index_pointer < k) and (sparse_row_pointer < indptr[P[i, perm[i, j]] + 1])):
if indices[sparse_row_pointer] == P[i, perm[i, index_pointer]]:
# We can add data to sub_matrices
sub_matrices[i, perm[i, j], perm[i, index_pointer]] = \
data[sparse_row_pointer]
if symmetric:
sub_matrices[i, perm[i, index_pointer], perm[i, j]] = \
data[sparse_row_pointer]
# Only increment the index pointer
index_pointer += 1
elif indices[sparse_row_pointer] > P[i, perm[i, index_pointer]]:
# Need to increment index pointer
index_pointer += 1
else:
# Need to increment sparse row pointer
sparse_row_pointer += 1
。
另一点需要注意的是,Cython方法似乎使用了大量的内存,但我不知道它的分配位置。
最新版本
根据ead的建议,下面是Cython代码的最新版本
# See https://stackoverflow.com/questions/48805636/efficient-slicing-of-symmetric-sparse-matrices
cimport cython
import numpy as np
cimport numpy as np
from libc.stdlib cimport malloc, free
from cython.parallel import prange
@cython.boundscheck(False) # turn off bounds-checking for entire function
cpdef sparse_slice_fast_cy(sigma,
np.ndarray[np.int32_t, ndim=2] P,
np.float64_t[:,:,:] sub_matrices,
int symmetric):
"""
Inputs:
sigma: A list (N,) of sparse sp.csr_matrix (m x m)
P: A 2D array of integers (N, k)
sub_matrices: A 3D array of doubles (N, k, k) containing the slicing
symmetric: 1 if the sigma matrices are symmetric
"""
# Create variables for keeping code tidy
cdef np.int32_t N = P.shape[0]
cdef np.int32_t k = P.shape[1]
cdef np.int32_t i
cdef np.int32_t j
cdef np.int32_t index_pointer
cdef np.int32_t sparse_row_pointer
# Create objects for holding sparse matrix data
cdef np.float64_t[:] data_mem_view
cdef np.int32_t[:] indices_mem_view
cdef np.int32_t[:] indptr_mem_view
cdef np.float64_t **data = <np.float64_t **> malloc(N * sizeof(np.float64_t *))
cdef np.int32_t **indices = <np.int32_t **> malloc(N * sizeof(np.int32_t *))
cdef np.int32_t **indptr = <np.int32_t **> malloc(N * sizeof(np.int32_t *))
for i in range(N):
data_mem_view = sigma[i].data
data[i] = &(data_mem_view[0])
indices_mem_view = sigma[i].indices
indices[i] = &(indices_mem_view[0])
indptr_mem_view = sigma[i].indptr
indptr[i] = &(indptr_mem_view[0])
# Object for the ordered P
cdef np.int32_t[:,:] perm = np.argsort(P, axis=1).astype(np.int32)
# Make sure sub_matrices is all 0
sub_matrices[:] = 0
for i in prange(N, nogil=True):
for j in range(k):
# Loop over row P[i, perm[j]] in sigma searching for values
# in P[i, :] vector i.e. compare
# sigma[P[i, perm[j], :]
# against
# P[i,:]
# To do this we need our sparse row vector with columns
# indices[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
# and data/values
# data[indptr[P[i, perm[j]]], indptr[P[i, perm[j]]+1]]
# which comes from the csr matrix format.
# We also need our sorted indexing vector
# P[i, perm[:]]
# We begin by pointing at the top of both
# our vectors and gradually move down them. In the event of
# an equality we add the data to sub_matrices[i,:,:] and
# increment the INDEXING VECTOR pointer, not the sparse
# row vector pointer, as there can be multiple values that
# are the same in the indexing vector but not the sparse row
# column vector (only 1 column can appear in 1 row!).
if symmetric:
index_pointer = j # Only search upper triangular
else:
index_pointer = 0
sparse_row_pointer = indptr[i][P[i, perm[i, j]]]
while ((index_pointer < k) and
(sparse_row_pointer < indptr[i][P[i, perm[i, j]] + 1])):
if indices[i][sparse_row_pointer] == P[i, perm[i, index_pointer]]:
# We can add data to sub_matrices
sub_matrices[i, perm[i, j], perm[i, index_pointer]] = \
data[i][sparse_row_pointer]
if symmetric:
sub_matrices[i, perm[i, index_pointer], perm[i, j]] = \
data[i][sparse_row_pointer]
# Only increment the index pointer
index_pointer = index_pointer + 1
elif indices[i][sparse_row_pointer] > P[i, perm[i, index_pointer]]:
# Need to increment index pointer
index_pointer = index_pointer + 1
else:
# Need to increment sparse row pointer
sparse_row_pointer = sparse_row_pointer + 1
# Free malloc'd data
free(data)
free(indices)
free(indptr)
并行版
下面是一个并行版本,虽然它似乎没有提供任何加速,但代码不再那么漂亮:
cythonize -i sparse_slice.pyx
测试
测试代码运行
sparse_slice.pyx
其中import time
import numpy as np
import scipy as sp
import scipy.sparse
from sparse_slice import sparse_slice_fast_cy
k = 100
N = 20000
m = 10000
samples = 20
# Create sigma matrices
## The sampling of random sparse takes a while so just do a few and
## then populate with these.
now = time.time()
sigma_samples = []
for i in range(samples):
sigma_samples.append(sp.sparse.rand(m, m, density=0.001, format='csr'))
sigma_samples[-1] = sigma_samples[-1] + sigma_samples[-1].T # Symmetric
## Now make the sigma list from these.
sigma = []
for i in range(N):
j = np.random.randint(samples)
sigma.append(sigma_samples[j])
print('Time to make sigma: {}'.format(time.time() - now))
# Create indexer
now = time.time()
P = np.empty([N, k]).astype(int)
for i in range(N):
P[i, :] = np.random.choice(np.arange(m), k, replace=True)
print('Time to make P: {}'.format(time.time() - now))
# Create objects for holding the slices
sub_matrices_slow = np.empty([N, k, k])
sub_matrices_fast = np.empty([N, k, k])
# Run both slicings
## Slow
now = time.time()
for i in range(N):
sub_matrices_slow[i,:,:] = sigma[i][np.ix_(P[i,:], P[i,:])].todense()
print('Time to make sub_matrices_slow: {}'.format(time.time() - now))
## Fast
symmetric = 1
now = time.time()
sparse_slice_fast_cy(sigma, P.astype(np.int32), sub_matrices_fast, symmetric)
print('Time to make sub_matrices_fast: {}'.format(time.time() - now))
assert(np.all((sub_matrices_slow - sub_matrices_fast)**2 < 1e-6))
是文件名。然后你可以使用这个脚本:
$(document).ready(function(){
// ARRAY FOR ITEMS
var items = [];
/* ***********************************************
HVAC_VALVE01_SCHED01 - READ
**************************************************
*/
for(var r = 1; r < 11; r++) {
var request = $.ajax
({
type : "GET",
url : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_ONOFF/state"
});
request.done( function(data)
{
if(data == "ON") {
$('.HVAC_VALVE01_SCHED' + r + '_ONOFF').prop('checked', true);
} else {
$('.HVAC_VALVE01_SCHED' + r + '_ONOFF').prop('checked', false);
}
items["HVAC_VALVE01_SCHED" + r + "_ONOFF"] = data;
});
/* */
var request = $.ajax
({
type : "GET",
url : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_URA/state"
});
request.done( function(data)
{
$(".HVAC_VALVE01_SCHED" + r + "_URA").val(data);
items["HVAC_VALVE01_SCHED" + r + "_URA"] = data;
});
/* */
var request = $.ajax
({
type : "GET",
url : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_MINUTA/state"
});
request.done( function(data)
{
$(".HVAC_VALVE01_SCHED" + r + "_MINUTA").val(data);
items["HVAC_VALVE01_SCHED" + r + "_MINUTA"] = data;
});
/* */
var request = $.ajax
({
type : "GET",
url : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_PO/state"
});
request.done( function(data)
{
if(data == "ON") {
$(".HVAC_VALVE01_SCHED" + r + "_PO").css('background', 'blue');
items["HVAC_VALVE01_SCHED" + r + "_PO"] = "ON";
} else {
$(".HVAC_VALVE01_SCHED" + r + "_PO").css('background', 'black');
items["HVAC_VALVE01_SCHED" + r + "_PO"] = "OFF";
}
});
/* */
var request = $.ajax
({
type : "GET",
url : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_TO/state"
});
request.done( function(data)
{
if(data == "ON") {
$(".HVAC_VALVE01_SCHED" + r + "_TO").css('background', 'blue');
items["HVAC_VALVE01_SCHED" + r + "_TO"] = "ON";
} else {
$(".HVAC_VALVE01_SCHED" + r + "_TO").css('background', 'black');
items["HVAC_VALVE01_SCHED" + r + "_TO"] = "OFF";
}
});
/* */
var request = $.ajax
({
type : "GET",
url : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_SR/state"
});
request.done( function(data)
{
if(data == "ON") {
$(".HVAC_VALVE01_SCHED" + r + "_SR").css('background', 'blue');
items["HVAC_VALVE01_SCHED" + r + "_SR"] = "ON";
} else {
$(".HVAC_VALVE01_SCHED" + r + "_SR").css('background', 'black');
items["HVAC_VALVE01_SCHED" + r + "_SR"] = "OFF";
}
});
/* */
var request = $.ajax
({
type : "GET",
url : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_CE/state"
});
request.done( function(data)
{
if(data == "ON") {
$(".HVAC_VALVE01_SCHED" + r + "_CE").css('background', 'blue');
items["HVAC_VALVE01_SCHED" + r + "_CE"] = "ON";
} else {
$(".HVAC_VALVE01_SCHED" + r + "_CE").css('background', 'black');
items["HVAC_VALVE01_SCHED" + r + "_CE"] = "OFF";
}
});
/* */
var request = $.ajax
({
type : "GET",
url : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_PE/state"
});
request.done( function(data)
{
if(data == "ON") {
$(".HVAC_VALVE01_SCHED" + r + "_PE").css('background', 'blue');
items["HVAC_VALVE01_SCHED" + r + "_PE"] = "ON";
} else {
$(".HVAC_VALVE01_SCHED" + r + "_PE").css('background', 'black');
items["HVAC_VALVE01_SCHED" + r + "_PE"] = "OFF";
}
});
/* */
var request = $.ajax
({
type : "GET",
url : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_SO/state"
});
request.done( function(data)
{
if(data == "ON") {
$(".HVAC_VALVE01_SCHED" + r + "_SO").css('background', 'blue');
items["HVAC_VALVE01_SCHED" + r + "_SO"] = "ON";
} else {
$(".HVAC_VALVE01_SCHED" + r + "_SO").css('background', 'black');
items["HVAC_VALVE01_SCHED" + r + "_SO"] = "OFF";
}
});
/* */
var request = $.ajax
({
type : "GET",
url : "http://localhost:8080/rest/items/HVAC_VALVE01_SCHED" + r + "_NE/state"
});
request.done( function(data)
{
if(data == "ON") {
$(".HVAC_VALVE01_SCHED" + r + "_NE").css('background', 'blue');
items["HVAC_VALVE01_SCHED" + r + "_NE"] = "ON";
} else {
$(".HVAC_VALVE01_SCHED" + r + "_NE").css('background', 'black');
items["HVAC_VALVE01_SCHED" + r + "_NE"] = "OFF";
}
});
}
答案 0 :(得分:2)
目前无法测试,但有两个建议:
A)对i
- 循环的所有行进行排序:
# Object for the ordered P
cdef long[:,:] perm = np.argsort(P, axis=1)
也许您需要将P传递为np.ndarray[np.int64_t, ndim=2] P
(或其任何类型)以避免复制。您必须通过perm[i,X]
而不是perm[X]
访问数据。
B)定义
cdef np.int32_t[:] indices
cdef np.int32_t[:] indptr
所以你不需要通过&#39; .astype`复制数据,即
for i in range(N):
data = sigma[i].data
indices = sigma[i].indices
indptr = sigma[i].indptr
我认为因为sigma[i]
有O(m)
个元素,复制是你的函数的瓶颈:你得到的运行时间O(N*(m+k^2))
而不是'O(N * k ^ 2) - 它很好避免它。
否则该功能看起来不太糟糕。
要使prange
与i
- 循环一起使用,您应该通过创建一种指向{的第一个元素的指针数组,将访问移动到循环之外的sigma[i]
。 {1}},data
和indices
并在便宜的预处理步骤中填充它们。一个人可以使它工作,但问题是并行化带来了多少收益 - 很可能是这样,问题是内存限制的 - 人们必须看到时间安排。
您也可以通过仅处理上三角矩阵来使用对称性:
indptr
我会从B)开始,看看它是如何运作的......
修改强>
关于内存使用情况:可以通过
测量峰值内存使用情况 ...
index_pointer = j #only upper triangle!
....
....
# We can add data to sub_matrices
#upper triangle sub-matrix:
sub_matrices[i, perm[j], perm[index_pointer]] = \
data[sparse_row_pointer]
#lower triangle sub-matrix:
sub_matrices[i, perm[index_pointer], perm[j]] = \
data[sparse_row_pointer]
....
我用 /usr/bin/time -f "peak_used_memory:%M(in Kb)" python test.py
运行我的测试并得到(python3.6 + cython0.27.1):
N=2000
因此有50Mb的开销,200Mb被任一函数使用,另外176Mb用于评估断言。对于 peak memory usage
only slow 245Mb
only fast 245Mb
slow+fast no check 402Mb
slow+fast+assert 576Mb
的其他值,我也可以看到相同的行为。
所以我想说cython没有大量的内存使用。
此任务很可能(至少部分)内存限制,因此并行化将无济于事。您应该减少加载到缓存的内存量。
一种可能性是不使用N
- 毕竟它还需要加载到缓存中。如果
perm
并使用它。我猜你在最好的情况下可以赢得大约20-30%。
有时cython产生的代码对于c编译器来说不容易优化,而且直接在C中编写然后用python包装它会获得更好的结果。
但是,只有当这个操作确实是你的程序的瓶颈时,我才能做到这一切。
顺便说一句,宣布
P
您无需额外复制。