我有一个稀疏的csr_matrix,我想将单行的值更改为不同的值。但是我找不到简单有效的实现方法。这就是它必须做的事情:
A = csr_matrix([[0, 1, 0],
[1, 0, 1],
[0, 1, 0]])
new_row = np.array([-1, -1, -1])
print(set_row_csr(A, 2, new_row).todense())
>>> [[ 0, 1, 0],
[ 1, 0, 1],
[-1, -1, -1]]
这是我set_row_csr
的当前实现:
def set_row_csr(A, row_idx, new_row):
A[row_idx, :] = new_row
return A
但这给了我一个SparseEfficiencyWarning
。有没有办法在没有手动索引杂耍的情况下完成这项工作,或者这是我唯一的出路?
答案 0 :(得分:3)
最后,我设法通过索引杂耍完成了这项工作。
def set_row_csr(A, row_idx, new_row):
'''
Replace a row in a CSR sparse matrix A.
Parameters
----------
A: csr_matrix
Matrix to change
row_idx: int
index of the row to be changed
new_row: np.array
list of new values for the row of A
Returns
-------
None (the matrix A is changed in place)
Prerequisites
-------------
The row index shall be smaller than the number of rows in A
The number of elements in new row must be equal to the number of columns in matrix A
'''
assert sparse.isspmatrix_csr(A), 'A shall be a csr_matrix'
assert row_idx < A.shape[0], \
'The row index ({0}) shall be smaller than the number of rows in A ({1})' \
.format(row_idx, A.shape[0])
try:
N_elements_new_row = len(new_row)
except TypeError:
msg = 'Argument new_row shall be a list or numpy array, is now a {0}'\
.format(type(new_row))
raise AssertionError(msg)
N_cols = A.shape[1]
assert N_cols == N_elements_new_row, \
'The number of elements in new row ({0}) must be equal to ' \
'the number of columns in matrix A ({1})' \
.format(N_elements_new_row, N_cols)
idx_start_row = A.indptr[row_idx]
idx_end_row = A.indptr[row_idx + 1]
additional_nnz = N_cols - (idx_end_row - idx_start_row)
A.data = np.r_[A.data[:idx_start_row], new_row, A.data[idx_end_row:]]
A.indices = np.r_[A.indices[:idx_start_row], np.arange(N_cols), A.indices[idx_end_row:]]
A.indptr = np.r_[A.indptr[:row_idx + 1], A.indptr[(row_idx + 1):] + additional_nnz]
答案 1 :(得分:2)
物理吸引力的答案确实明显更快。它比我的解决方案快得多,后者只是添加一个单独的矩阵与该单行集。虽然添加溶液比切片解决方案更快。
对我而言,在csr_matrix或csc_matrix中的列中设置行的最快方法是自己修改基础数据。
def time_copy(A, num_tries = 10000):
start = time.time()
for i in range(num_tries):
B = A.copy()
end = time.time()
return end - start
def test_method(func, A, row_idx, new_row, num_tries = 10000):
start = time.time()
for i in range(num_tries):
func(A.copy(), row_idx, new_row)
end = time.time()
copy_time = time_copy(A, num_tries)
print("Duration {}".format((end - start) - copy_time))
def set_row_csr_slice(A, row_idx, new_row):
A[row_idx,:] = new_row
def set_row_csr_addition(A, row_idx, new_row):
indptr = np.zeros(A.shape[1] + 1)
indptr[row_idx +1:] = A.shape[1]
indices = np.arange(A.shape[1])
A += csr_matrix((new_row, indices, indptr), shape=A.shape)
>>> A = csr_matrix((np.ones(1000), (np.random.randint(0,1000,1000), np.random.randint(0, 1000, 1000))))
>>> test_method(set_row_csr_slice, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 4.938395977020264
>>> test_method(set_row_csr_addition, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 2.4161765575408936
>>> test_method(set_row_csr, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 0.8432261943817139
切片解决方案也随着矩阵的大小和稀疏性而变得更加糟糕。
# Larger matrix, same fraction sparsity
>>> A = csr_matrix((np.ones(10000), (np.random.randint(0,10000,10000), np.random.randint(0, 10000, 10000))))
>>> test_method(set_row_csr_slice, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 18.335174798965454
>>> test_method(set_row_csr, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 1.1089558601379395
# Super sparse matrix
>>> A = csr_matrix((np.ones(100), (np.random.randint(0,10000,100), np.random.randint(0, 10000, 100))))
>>> test_method(set_row_csr_slice, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 13.371600151062012
>>> test_method(set_row_csr, A, 200, np.ones(A.shape[1]), num_tries = 10000)
Duration 1.0454308986663818
答案 2 :(得分:0)
这个set_row_csr
出了点问题。是的,它很快,似乎适用于一些测试用例。但是,在我的测试用例中,它似乎会破坏csr稀疏矩阵的内部csr结构。之后尝试lil_matrix(A)
,您会看到错误消息。
答案 3 :(得分:0)
在physicalattraction的答案中,len(new_row)
必须等于A.shape[1]
在添加稀疏行时可能没什么兴趣。
所以,根据他的回答,我提出了一种在csr中设置行的方法,同时它保留了sparcity属性。另外,我添加了一种方法将密集数组转换为稀疏数组(数据,索引格式)
def to_sparse(dense_arr):
sparse = [(data, index) for index, data in enumerate(dense_arr) if data != 0]
# Convert list of tuples to lists
sparse = list(map(list, zip(*sparse)))
# Return data and indices
return sparse[0], sparse[1]
def set_row_csr_unbounded(A, row_idx, new_row_data, new_row_indices):
'''
Replace a row in a CSR sparse matrix A.
Parameters
----------
A: csr_matrix
Matrix to change
row_idx: int
index of the row to be changed
new_row_data: np.array
list of new values for the row of A
new_row_indices: np.array
list of indices for new row
Returns
-------
None (the matrix A is changed in place)
Prerequisites
-------------
The row index shall be smaller than the number of rows in A
Row data and row indices must have the same size
'''
assert isspmatrix_csr(A), 'A shall be a csr_matrix'
assert row_idx < A.shape[0], \
'The row index ({0}) shall be smaller than the number of rows in A ({1})' \
.format(row_idx, A.shape[0])
try:
N_elements_new_row = len(new_row_data)
except TypeError:
msg = 'Argument new_row_data shall be a list or numpy array, is now a {0}'\
.format(type(new_row_data))
raise AssertionError(msg)
try:
assert N_elements_new_row == len(new_row_indices), \
'new_row_data and new_row_indices must have the same size'
except TypeError:
msg = 'Argument new_row_indices shall be a list or numpy array, is now a {0}'\
.format(type(new_row_indices))
raise AssertionError(msg)
idx_start_row = A.indptr[row_idx]
idx_end_row = A.indptr[row_idx + 1]
A.data = np.r_[A.data[:idx_start_row], new_row_data, A.data[idx_end_row:]]
A.indices = np.r_[A.indices[:idx_start_row], new_row_indices, A.indices[idx_end_row:]]
A.indptr = np.r_[A.indptr[:row_idx + 1], A.indptr[(row_idx + 1):] + N_elements_new_row]
答案 4 :(得分:0)
这是我的方法:
A = A.tolil()
A[index, :] = new_row
A = A.tocsr()
只需转换为lil_matrix
,更改行并转换回即可。