Question

我有一个大的numpy数组（通常是500,000x1024，但可能更大），我正在尝试执行一些依赖于数组中正值的位置的进程。一个非常小的示例数组可能是

  [[ 0., 0., 0., 0., 0.,-1.,-1., 0., 0.],
   [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
   [ 0., 1., 1., 0., 0., 1., 5., 0., 0.],
   [ 0., 1., 1., 0., 0., 0., 1., 0., 0.],
   [ 0., 3., 1., 0., 0., 2., 1., 0., 0.],
   [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
   [ 0., 1., 0., 0., 0., 1., 1., 0., 0.],
   [ 0., 0., 0., 0., 0., 0., 0., 0., 0.]]

第一种是在每行中间隔少于三列的正值之间替换任何零。所以，如果我用50替换这些数字，我的示例输出将是

 [[ 0., 0., 0., 0., 0.,-1.,-1., 0., 0.],
  [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
  [ 0., 1., 1.,50.,50., 1., 5., 0., 0.],
  [ 0., 1., 1., 0., 0., 0., 1., 0., 0.],
  [ 0., 3., 1.,50.,50., 2., 1., 0., 0.],
  [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
  [ 0., 1., 0., 0., 0., 1., 1., 0., 0.],
  [ 0., 0., 0., 0., 0., 0., 0., 0., 0.]]

我需要做的第二件事是根据正值范围的位置为每一行写出一些信息。例如，使用我改变的数组，我需要能够为col [1：7]声明正整数的第三行写出一个语句，并且在col [1：3]和col中声明第四行的两个语句来声明正整数[6]。

我已经设法利用numpy矢量化方法来解决第一个任务，但仍然最终需要循环遍历列和行（尽管是整个数组的一个子集）。否则，我最终会替换给定行中的所有零而不是正值之间的所有零。

但是第二项任务似乎无法在没有使用

遍历整个阵列的情况下找到方法

for col in arr:
  for row in arr:

我想我的整体问题是，有没有办法在numpy中使用矢量化方法来定义每个行不同的列索引范围，并依赖于下一列中的值？

非常感谢任何帮助。

Answer 1

不幸的是，Numpy在没有生成更多阵列的情况下无法进行大量处理，因此我担心任何解决方案都需要某种形式的手动循环，就像您一直在使用或创建一个或多个其他大型阵列一样。您可以使用numexpr提出一个速度非常快且内存效率高的解决方案。

这里＆＃39;以一种不一定记忆效率的方式做这件事，但至少所有的循环都是由Numpy完成的，所以应该比你的快得多只要它适合你的记忆就一直在做。（通过将其中一些重写为就地操作可以提高内存效率，但我不担心这一点。）

这是你的第1步：

positive = x>0 # a boolean array marking the positive values in x

positive0 = positive[:,0:-3] # all but last 3 columns 
positive1 = positive[:,1:-2] # all but 1st and last 2 columns; not actually used
positive2 = positive[:,2:-1] # all but first 2 and last 1 columns
positive3 = positive[:,3:  ] # all but first 3 columns

# In the following, the suffix 1 indicates that we're viewing things from the perspective
# of entries in positive1 above.  So, e.g., has_pos_1_to_left1 will be True at
# any position where an entry in positive1 would be preceded by a positive entry in x

has_pos_1_to_left1 = positive0
has_pos_1_or_2_to_right1 = positive2 | positive3
flanked_by_positives1 = has_pos_1_to_left1 & has_pos_1_or_2_to_right1

zeros = (x == 0)       # indicates everywhere x is 0
zeros1 = zeros[:,1:-2] # all but 1st and last 2 columns

x1 = x[:,1:-2]         # all but 1st and last 2 columns

x1[zeros1 & flanked_by_positives1] = 50 # fill in zeros that were flanked - overwrites x!

# The preceding didn't address the next to last column, b/c we couldn't
# look two slots to the right of it without causing error.  Needs special treatment:
x[:,-2][ zeros[:,-2] & positive[:,-1] & (positive[:,-4] or positive[:,-3])] = 50

这是你的第2步：

filled_positives = x>0 # assuming we just filled in x
diffs = numpy.diff(filled_positives) # will be 1 at first positive in any sequence,
                                     # -1 after last positive, zero elsewhere

endings = numpy.where(diffs==-1) # tuple specifying coords where positive sequences end 
                                 # omits final column!!!
beginnings = numpy.where(diffs==1) # tuple specifying coords where pos seqs about to start
                                   # omits column #0!!!

使用这些开始和结束坐标来提取有关您所需的每一行的信息应该是直截了当的，但请记住，这种差异检测方法只能将过渡从非正变为正反之亦然，因此它不会提到从第0列开始或在最后一列结束的正序列，因此如果需要，您需要单独查找这些非转换。

Answer 2

您可以使用高效的numpy迭代器，例如flatiter或nditer

例如，对于您的第二项任务

In [1]: x = array([[ 0., 0., 0., 0., 0.,-1.,-1., 0., 0.],
   ...:            [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
   ...:            [ 0., 1., 1.,50.,50., 1., 5., 0., 0.],
   ...:            [ 0., 1., 1., 0., 0., 0., 1., 0., 0.],
   ...:            [ 0., 3., 1.,50.,50., 2., 1., 0., 0.],
   ...:            [ 0., 0., 0., 0., 0., 0., 0., 0., 0.],
   ...:            [ 0., 1., 0., 0., 0., 1., 1., 0., 0.],
   ...:            [ 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [2]: islands = []
   ...: fl = x.flat
   ...: while fl.index < x.size:
   ...:     coord = fl.coords
   ...:     if fl.next() > 0:
   ...:         length = 1
   ...:         while fl.next() > 0:
   ...:             length +=1
   ...:         islands.append([coord, length])

In [3]: for (row, col), length in islands:
   ...:     print 'row:%d ; col[%d:%d]' %(row, col, col+length)
row:2 ; col[1:7]
row:3 ; col[1:3]
row:3 ; col[6:7]
row:4 ; col[1:7]
row:6 ; col[1:2]
row:6 ; col[5:7]

Answer 3

对于您的第一个问题：创建一个变量，该变量保存您遇到的第一个正数的索引，并且如果下一个值为正且计数，则使用if语句重置位置（变量计数远离第一个正数的位置）小于3.

对于您的第二个问题：创建一个数组并添加正值位置的索引。

 String[] indices = new String[];
 int pos = 0;
 for col in arr:
     for row in arr:
        if(index is positive){
             indices[pos] = "[" + col + ":" + row + "]";
             pos++;
         }

Answer 4

第二种方法会让数据创建对象，所以假设你有一个类：

public class Matrix{
   int indicex;
   int indicey;
   double val;
   boolean positiveInt;

   //default constructor
   public Matrix(int indicex, int indicey, double val, boolean positiveInt){
   this.indicex = indicex;
   this.indicey = indicey;
   this.val = val;
   this.positiveInt = positiveInt;
   }    

   //getter
   public boolean isPositive(){
        if(positiveInt == true){
              return true;
        }else{
            return false;
        }

然后在你的驱动程序类中你将读取你的数据并创建一个对象新的Matrix（indexx，indexy，val，true / false）....然后它将放入你可以搜索的arraylist中对于正数。

List<Matrix> storeObjects = new ArrayList<Matrix>();
some method(){
   Matrix matrixObject = new Matrix(indexx, indexy, val, trueOrFalse);
   storeObjects.add(matrixObject)
 }

 for every object in store objects 
    if(object.isPositive()){
         put object in a separate array of positive objects
     }
  }

在2D numpy数组中有效地找到正值的索引范围

4 个答案: