如何以便携格式保存/加载scipy稀疏csr_matrix
? scipy稀疏矩阵在Python 3(Windows 64位)上创建,以在Python 2(Linux 64位)上运行。最初,我使用了pickle(使用protocol = 2和fix_imports = True),但是这从Python 3.2.2(Windows 64位)到Python 2.7.2(Windows 32位)没有用,并且得到了错误:
TypeError: ('data type not understood', <built-in function _reconstruct>, (<type 'numpy.ndarray'>, (0,), '[98]')).
接下来,尝试了numpy.save
和numpy.load
以及scipy.io.mmwrite()
和scipy.io.mmread()
,但这些方法都没有。
答案 0 :(得分:102)
编辑:SciPy 1.19现在有scipy.sparse.save_npz
和scipy.sparse.load_npz
。
from scipy import sparse
sparse.save_npz("yourmatrix.npz", your_matrix)
your_matrix_back = sparse.load_npz("yourmatrix.npz")
对于这两个函数,file
参数也可以是类文件对象(即open
的结果)而不是文件名。
得到了Scipy用户组的回答:
csr_matrix有3个重要的数据属性:
.data
,.indices
和.indptr
。所有都是简单的ndarray,因此numpy.save
将对它们起作用。使用numpy.save
或numpy.savez
保存三个数组,使用numpy.load
加载它们,然后使用以下命令重新创建稀疏矩阵对象:new_csr = csr_matrix((data, indices, indptr), shape=(M, N))
例如:
def save_sparse_csr(filename, array):
np.savez(filename, data=array.data, indices=array.indices,
indptr=array.indptr, shape=array.shape)
def load_sparse_csr(filename):
loader = np.load(filename)
return csr_matrix((loader['data'], loader['indices'], loader['indptr']),
shape=loader['shape'])
答案 1 :(得分:35)
虽然你写了,scipy.io.mmwrite
和scipy.io.mmread
不适合你,但我只想补充它们的工作方式。这个问题是否定的。 1 Google点击,所以我自己开始使用np.savez
和pickle.dump
,然后转换为简单明了的scipy函数。他们为我工作,不应该被那些没有尝试过的人监督。
from scipy import sparse, io
m = sparse.csr_matrix([[0,0,0],[1,0,0],[0,1,0]])
m # <3x3 sparse matrix of type '<type 'numpy.int64'>' with 2 stored elements in Compressed Sparse Row format>
io.mmwrite("test.mtx", m)
del m
newm = io.mmread("test.mtx")
newm # <3x3 sparse matrix of type '<type 'numpy.int32'>' with 2 stored elements in COOrdinate format>
newm.tocsr() # <3x3 sparse matrix of type '<type 'numpy.int32'>' with 2 stored elements in Compressed Sparse Row format>
newm.toarray() # array([[0, 0, 0], [1, 0, 0], [0, 1, 0]], dtype=int32)
答案 2 :(得分:25)
以下是使用Jupyter笔记本的三个最受欢迎的答案的性能比较。输入是一个1M x 100K随机稀疏矩阵,密度为0.001,包含100M非零值:
from scipy.sparse import random
matrix = random(1000000, 100000, density=0.001, format='csr')
matrix
<1000000x100000 sparse matrix of type '<type 'numpy.float64'>'
with 100000000 stored elements in Compressed Sparse Row format>
io.mmwrite
/ io.mmread
from scipy.sparse import io
%time io.mmwrite('test_io.mtx', matrix)
CPU times: user 4min 37s, sys: 2.37 s, total: 4min 39s
Wall time: 4min 39s
%time matrix = io.mmread('test_io.mtx')
CPU times: user 2min 41s, sys: 1.63 s, total: 2min 43s
Wall time: 2min 43s
matrix
<1000000x100000 sparse matrix of type '<type 'numpy.float64'>'
with 100000000 stored elements in COOrdinate format>
Filesize: 3.0G.
(请注意,格式已从csr更改为coo)。
np.savez
/ np.load
import numpy as np
from scipy.sparse import csr_matrix
def save_sparse_csr(filename, array):
# note that .npz extension is added automatically
np.savez(filename, data=array.data, indices=array.indices,
indptr=array.indptr, shape=array.shape)
def load_sparse_csr(filename):
# here we need to add .npz extension manually
loader = np.load(filename + '.npz')
return csr_matrix((loader['data'], loader['indices'], loader['indptr']),
shape=loader['shape'])
%time save_sparse_csr('test_savez', matrix)
CPU times: user 1.26 s, sys: 1.48 s, total: 2.74 s
Wall time: 2.74 s
%time matrix = load_sparse_csr('test_savez')
CPU times: user 1.18 s, sys: 548 ms, total: 1.73 s
Wall time: 1.73 s
matrix
<1000000x100000 sparse matrix of type '<type 'numpy.float64'>'
with 100000000 stored elements in Compressed Sparse Row format>
Filesize: 1.1G.
cPickle
import cPickle as pickle
def save_pickle(matrix, filename):
with open(filename, 'wb') as outfile:
pickle.dump(matrix, outfile, pickle.HIGHEST_PROTOCOL)
def load_pickle(filename):
with open(filename, 'rb') as infile:
matrix = pickle.load(infile)
return matrix
%time save_pickle(matrix, 'test_pickle.mtx')
CPU times: user 260 ms, sys: 888 ms, total: 1.15 s
Wall time: 1.15 s
%time matrix = load_pickle('test_pickle.mtx')
CPU times: user 376 ms, sys: 988 ms, total: 1.36 s
Wall time: 1.37 s
matrix
<1000000x100000 sparse matrix of type '<type 'numpy.float64'>'
with 100000000 stored elements in Compressed Sparse Row format>
Filesize: 1.1G.
注意:cPickle不适用于非常大的对象(请参阅this answer)。
根据我的经验,它不适用于具有270M非零值的2.7M x 50k矩阵。
np.savez
解决方案效果很好。
(基于这个简单的CSR矩阵测试)
cPickle
是最快的方法,但它不适用于非常大的矩阵,np.savez
只是稍慢,而io.mmwrite
要慢得多,产生更大的文件并恢复到格式错误。所以np.savez
就是胜利者。
答案 3 :(得分:16)
现在您可以使用for (i = 0; i < ids.length; i++)
{
if (ids[i].length > 0)
{
dojo.query("div[id*=" + ids[i] + "]", form).forEach(function(element)
element.style.display = "none";
// HOW TO CLEAR THE CONTENTS AS WELL ?
})
}
:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.save_npz.html
答案 4 :(得分:11)
假设你在两台机器上都有scipy,你可以使用pickle
。
但是,在pickling numpy数组时一定要指定二进制协议。否则你最终会得到一个巨大的档案。
无论如何,你应该能够做到这一点:
import cPickle as pickle
import numpy as np
import scipy.sparse
# Just for testing, let's make a dense array and convert it to a csr_matrix
x = np.random.random((10,10))
x = scipy.sparse.csr_matrix(x)
with open('test_sparse_array.dat', 'wb') as outfile:
pickle.dump(x, outfile, pickle.HIGHEST_PROTOCOL)
然后您可以使用以下内容加载它:
import cPickle as pickle
with open('test_sparse_array.dat', 'rb') as infile:
x = pickle.load(infile)
答案 5 :(得分:9)
从scipy 0.19.0开始,你可以用这种方式保存和加载稀疏矩阵:
from scipy import sparse
data = sparse.csr_matrix((3, 4))
#Save
sparse.save_npz('data_sparse.npz', data)
#Load
data = sparse.load_npz("data_sparse.npz")
答案 6 :(得分:1)
加两分钱:对我来说,npz
不可移植,因为我不能用它轻松地将矩阵导出到非Python客户端(例如PostgreSQL,很高兴得到纠正)。因此,我希望获得稀疏矩阵的CSV输出(就像您将print()
稀疏矩阵得到的那样)。如何实现这一点取决于稀疏矩阵的表示。对于CSR矩阵,以下代码将输出CSV输出。您可以适应其他表示形式。
import numpy as np
def csr_matrix_tuples(m):
# not using unique will lag on empty elements
uindptr, uindptr_i = np.unique(m.indptr, return_index=True)
for i, (start_index, end_index) in zip(uindptr_i, zip(uindptr[:-1], uindptr[1:])):
for j, data in zip(m.indices[start_index:end_index], m.data[start_index:end_index]):
yield (i, j, data)
for i, j, data in csr_matrix_tuples(my_csr_matrix):
print(i, j, data, sep=',')
根据我的测试,它比当前实现中的save_npz
慢2倍。
答案 7 :(得分:0)
这就是我用来保存lil_matrix
的内容。
import numpy as np
from scipy.sparse import lil_matrix
def save_sparse_lil(filename, array):
# use np.savez_compressed(..) for compression
np.savez(filename, dtype=array.dtype.str, data=array.data,
rows=array.rows, shape=array.shape)
def load_sparse_lil(filename):
loader = np.load(filename)
result = lil_matrix(tuple(loader["shape"]), dtype=str(loader["dtype"]))
result.data = loader["data"]
result.rows = loader["rows"]
return result
我必须说我发现NumPy的np.load(..)非常慢。这是我目前的解决方案,我感觉跑得快得多:
from scipy.sparse import lil_matrix
import numpy as np
import json
def lil_matrix_to_dict(myarray):
result = {
"dtype": myarray.dtype.str,
"shape": myarray.shape,
"data": myarray.data,
"rows": myarray.rows
}
return result
def lil_matrix_from_dict(mydict):
result = lil_matrix(tuple(mydict["shape"]), dtype=mydict["dtype"])
result.data = np.array(mydict["data"])
result.rows = np.array(mydict["rows"])
return result
def load_lil_matrix(filename):
result = None
with open(filename, "r", encoding="utf-8") as infile:
mydict = json.load(infile)
result = lil_matrix_from_dict(mydict)
return result
def save_lil_matrix(filename, myarray):
with open(filename, "w", encoding="utf-8") as outfile:
mydict = lil_matrix_to_dict(myarray)
json.dump(mydict, outfile)
答案 8 :(得分:0)
我被要求以简单通用的格式发送矩阵:
def save_sparse_matrix(m,filename):
thefile = open(filename, 'w')
nonZeros = np.array(m.nonzero())
for entry in range(nonZeros.shape[1]):
thefile.write("%s,%s,%s\n" % (nonZeros[0, entry], nonZeros[1, entry], m[nonZeros[0, entry], nonZeros[1, entry]]))
我最终得到了这个:
package com.example.mac.gpacalculator;
import android.os.Bundle;
import android.support.design.widget.FloatingActionButton;
import android.support.design.widget.Snackbar;
import android.support.v7.app.AppCompatActivity;
import android.support.v7.widget.Toolbar;
import android.view.View;
import android.view.Menu;
import android.view.MenuItem;
import android.widget.EditText;
import android.widget.TextView;
import android.widget.Button;
public class MainActivity extends AppCompatActivity {
TextView result;
EditText grade1;
EditText grade2;
EditText grade3;
EditText grade4;
EditText grade5;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
Toolbar toolbar = (Toolbar) findViewById(R.id.toolbar);
setSupportActionBar(toolbar);
FloatingActionButton fab = (FloatingActionButton) findViewById(R.id.fab);
fab.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View view) {
Snackbar.make(view, "Replace with your own action", Snackbar.LENGTH_LONG)
.setAction("Action", null).show();
}
});
result=(TextView) findViewById(R.id.total);
grade1=(EditText)findViewById(R.id.num1);
grade2=(EditText)findViewById(R.id.num2);
grade3=(EditText)findViewById(R.id.num3);
grade4=(EditText)findViewById(R.id.num4);
grade5=(EditText)findViewById(R.id.num5);
final Button calcbtn= (Button) findViewById(R.id.calc);
calcbtn.setOnClickListener(new View.OnClickListener(){
@Override
public void onClick(View view) {
double c1 = Float.parseFloat(grade1.getText().toString());
c1=convert(String.valueOf(grade1));
double c2 = Float.parseFloat(grade2.getText().toString());
c1=convert(String.valueOf(grade2));
double c3 = Float.parseFloat(grade3.getText().toString());
c1=convert(String.valueOf(grade3));
double c4 = Float.parseFloat(grade4.getText().toString());
c1=convert(String.valueOf(grade4));
double c5 = Float.parseFloat(grade5.getText().toString());
c1=convert(String.valueOf(grade5));
double c6=calculation(c1,c2,c3,c4,c5);
result.setText((int) c6);
}
});
}
@Override
public boolean onCreateOptionsMenu(Menu menu) {
// Inflate the menu; this adds items to the action bar if it is present.
getMenuInflater().inflate(R.menu.menu_main, menu);
return true;
}
public static double convert(String grade)
{
double a=0.0;
//checking the conditions
if(grade.equalsIgnoreCase("A"))
{
a=4.0;
}
if(grade.equalsIgnoreCase("A-"))
{
a=3.7;
}
if(grade.equalsIgnoreCase("B+"))
{
a=3.3;
}
else if(grade.equalsIgnoreCase("B"))
{
a=3.0;
}
if(grade.equalsIgnoreCase("B-"))
{
a=2.7;
}
if(grade.equalsIgnoreCase("C+"))
{
a=2.3;
}
else if(grade.equalsIgnoreCase("C"))
{
a=2.0;
}
if(grade.equalsIgnoreCase("C-"))
{
a=1.7;
}
if(grade.equalsIgnoreCase("D+"))
{
a=1.3;
}
else if(grade.equalsIgnoreCase("D"))
{
a=1.0;
}
else if(grade.equalsIgnoreCase("F"))
{
a=0.0;
}
return a;
}
public static double calculation(double c1, double c2, double c3, double c4, double c5)
{
double operation;
operation=(c1+c2+c3+c4+c5)/5;//calculating the GPA
return operation;
}
@Override
public boolean onOptionsItemSelected(MenuItem item) {
// Handle action bar item clicks here. The action bar will
// automatically handle clicks on the Home/Up button, so long
// as you specify a parent activity in AndroidManifest.xml.
int id = item.getItemId();
//noinspection SimplifiableIfStatement
if (id == R.id.action_settings) {
return true;
}
return super.onOptionsItemSelected(item);
}
}
答案 9 :(得分:0)
这对我有用:
import numpy as np
import scipy.sparse as sp
x = sp.csr_matrix([1,2,3])
y = sp.csr_matrix([2,3,4])
np.savez(file, x=x, y=y)
npz = np.load(file)
>>> npz['x'].tolist()
<1x3 sparse matrix of type '<class 'numpy.int64'>'
with 3 stored elements in Compressed Sparse Row format>
>>> npz['x'].tolist().toarray()
array([[1, 2, 3]], dtype=int64)
技巧是调用.tolist()
来将形状为0的对象数组转换为原始对象。