Question

我有以下

(Pdb) training
array(<418326x223957 sparse matrix of type '<type 'numpy.float64'>'
    with 165657096 stored elements in Compressed Sparse Row format>, dtype=object)
(Pdb) training.shape
()

为什么没有形状信息？

编辑：这就是我所做的：

training, target, test, projectids = generate_features(outcomes, projects, resources)
target = np.array([1. if i == 't' else 0. for i in target])
projectids = np.array([i for i in projectids])

print 'vectorizing training features'
d = DictVectorizer(sparse=True)
training = d.fit_transform(training[:10].T.to_dict().values())
#test_data = d.fit_transform(training.T.to_dict().values())
test_data = d.transform(test[:10].T.to_dict().values())

print 'training shape: %s, %s' %(training.shape[0], training[1])
print 'test shape: %s, %s' %(test_data.shape[0], test_data[1])

print 'saving vectorized instances'
with open(filename, "wb") as f:
    np.save(f, training)
    np.save(f, test_data)
    np.save(f, target)
    np.save(f, projectids)

此时，我的训练形状仍为(10, 121)。

稍后，我只是通过

重新初始化4个变量

with open("../data/f1/training.dat", "rb") as f:
    training = np.load(f)
    test_data = np.load(f)
    target = np.load(f)
    projectids = np.load(f)

但形状消失了。

Answer 1

中有形状信息

array(<418326x223957 sparse matrix of type '<type 'numpy.float64'>'
    with 165657096 stored elements in Compressed Sparse Row format>, dtype=object)

这是一个项目的数组，0维，因此形状为()。这一项是dtype=object。具体来说，它是一个稀疏数组 - 尺寸显示在显示<418...x22...。

我要问DictVectorizer和fit_transform，但这并不重要。它是更改值的保存和加载操作。

我的猜测是你没有加载刚才写的文件。

您的np.save(f,training)正在使用dtype np.array将object包裹在稀疏矩阵中。这就是你在加载时看到的内容。

training = training.item()

从该数组包装器中取出稀疏矩阵。

418326x223957的形状是training的完整数据集，(10, 121)的形状是减少的调试集吗？

为什么numpy形状空？

1 个答案: