Question

我正在编写代码，以同时高效地从多个大型并行scipy sparse.csc矩阵（意味着所有矩阵具有相同的暗淡，并且所有nnz元素位于相同的位置）中删除多列。我这样做是通过仅索引要保留给一个矩阵的列，然后为其他矩阵重用索引和indptr列表。但是，当我通过列表对csc矩阵进行索引时，它将对数据列表进行重新排序，因此无法重用索引。有没有一种方法可以强制scipy将数据列表保持原始顺序？为什么仅在按列表建立索引时才重新排序？

import scipy.sparse
import numpy as np
mat = scipy.sparse.csc_matrix(np.array([[1,0,0,0,2,5], 
                                        [1,0,1,0,0,0], 
                                        [0,0,0,4,0,1],
                                        [0,3,0,1,0,4]]))
print mat[:,3].data

返回数组（[4，1]）

print mat[:,[3]].data

返回数组（[1，4]）

Answer 1

Expression<Func<ProductDTO, bool>>

标量选择：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using Microsoft.AspNet.OData.Query;
using AutoMapper.Extensions.ExpressionMapping;
using AutoMapper.QueryableExtensions;

namespace ProductApp
{
    public class DomainLayer
    {
        public IEnumerable<ProductDTO> GetProductsByEntityOptions(ODataQueryOptions<ProductDTO> options)
        {
            var mapper = MyMapper.GetMapper();

            // This is the trick to get the expression out of the FilterQueryOption...
            IQueryable queryable = Enumerable.Empty<ProductDTO>().AsQueryable();
            queryable = options.Filter.ApplyTo(queryable, new ODataQuerySettings());            
            var exp = (MethodCallExpression) queryable.Expression;              // <-- This comes back as a MethodCallExpression...

            // Map the expression to my intermediate Product object type
            var mappedExp = mapper.Map<Expression<Func<Product, bool>>>(exp);   // <-- But I want it as a Expression<Func<ProductDTO, bool>> so I can map it...

            IEnumerable<Product> results = _dataAccessLayer.GetProducts(mappedExp);

            return mapper.Map<IEnumerable<ProductDTO>>(results);
        }
    }

    public class DataAccessLayer
    {
        public IEnumerable<Product> GetProducts(Expression<Func<Product, bool>> exp)
        {
            var mapper = MyMapper.GetMapper();

            var mappedExp = mapper.Map<Expression<Func<ProductEntity, bool>>>(exp);
            IEnumerable<ProductEntity> result = _dataContext.GetTable<ProductEntity>().Where(mappedExpression).ToList();

            return mapper.Map<IEnumerable<Product>>(result);
        }
    }
}

列表索引：

In [43]: mat = sparse.csc_matrix(np.array([[1,0,0,0,2,5],[1,0,1,0,0,0],[0,0,0,4,
    ...: 0,1],[0,3,0,1,0,4]])) 
    ...:  
    ...:                                                                        
In [44]: mat                                                                    
Out[44]: 
<4x6 sparse matrix of type '<class 'numpy.int64'>'
    with 10 stored elements in Compressed Sparse Column format>
In [45]: mat.data                                                               
Out[45]: array([1, 1, 3, 1, 4, 1, 2, 5, 1, 4], dtype=int64)
In [46]: mat.indices                                                            
Out[46]: array([0, 1, 3, 1, 2, 3, 0, 0, 2, 3], dtype=int32)
In [47]: mat.indptr                                                             
Out[47]: array([ 0,  2,  3,  4,  6,  7, 10], dtype=int32)

排序：

In [48]: m1 = mat[:,3]                                                          
In [49]: m1                                                                     
Out[49]: 
<4x1 sparse matrix of type '<class 'numpy.int64'>'
    with 2 stored elements in Compressed Sparse Column format>
In [50]: m1.data                                                                
Out[50]: array([4, 1])
In [51]: m1.indices                                                             
Out[51]: array([2, 3], dtype=int32)
In [52]: m1.indptr                                                              
Out[52]: array([0, 2], dtype=int32)

带有列表的csc索引使用矩阵乘法。它根据索引构造一个提取器矩阵，然后进行点乘法。因此，这是一个全新的稀疏矩阵。不仅仅是csc数据和索引属性的子集。

csc矩阵具有一种方法来确保索引值排序（在列中）。应用该方法可能有助于确保以相同的方式对数组进行排序。

为什么scipy.sparse.csc_matrix不能保留我的np.array的索引顺序？

1 个答案: