Question

我的矩阵需要集中行。换句话说，每一行的两端都有尾随零，而实际数据则位于尾随零之间。但是，我需要两端的尾随零的数量相等，或者换句话说，我所谓的数据（尾随零之间的值）以行的中间为中心。这是一个例子：

array:
[[0, 1, 2, 0, 2, 1, 0, 0, 0],
 [2, 1, 1, 0, 0, 0, 0, 0, 0],
 [0, 0, 1, 0, 0, 2, 0, 0, 0]]

centred_array:
[[0, 0, 1, 2, 0, 2, 1, 0, 0],
 [0, 0, 0, 2, 1, 1, 0, 0, 0],
 [0, 0, 1, 0, 0, 2, 0, 0, 0]]

我希望能够很好地解释它，以便您可以看到我遇到的一些问题。一，我无法保证“数据”大小的均值，因此函数需要为偶数值选择一个中心;行也是如此（行可能具有偶数大小，这意味着需要选择一个放置的行）。

编辑：我应该注意到我有一个功能可以做到这一点;它只是我可以获得10 ^ 3行集中并且我的功能太慢，所以效率真的会有所帮助。

@HYRY

a = np.array([[0, 1, 2, 0, 2, 1, 0, 0, 0],
              [2, 1, 1, 0, 0, 0, 0, 0, 0],
              [0, 0, 1, 0, 0, 2, 0, 0, 0]])
cd = []
(x, y) = np.shape(a)
for row in a:
    trim = np.trim_zeros(row)
    to_add = y - np.size(trim)
    a = to_add / 2
    b = to_add - a
    cd.append(np.pad(trim, (a, b), 'constant', constant_values=(0, 0)).tolist())
result = np.array(cd)
print result

[[0 0 1 2 0 2 1 0 0]
 [0 0 0 2 1 1 0 0 0]
 [0 0 1 0 0 2 0 0 0]]

Answer 1

import numpy as np

def centralise(arr):
    # Find the x and y indexes of the nonzero elements:
    x, y = arr.nonzero()

    # Find the index of the left-most and right-most elements for each row:
    nonzeros = np.bincount(x)
    nonzeros_idx = nonzeros.cumsum()
    left = y[np.r_[0, nonzeros_idx[:-1]]]
    right = y[nonzeros_idx-1]

    # Calculate how much each y has to be shifted
    shift = ((arr.shape[1] - (right-left) - 0.5)//2 - left).astype(int)
    shift = np.repeat(shift, nonzeros) 
    new_y = y + shift

    # Create centered_arr
    centered_arr = np.zeros_like(arr)
    centered_arr[x, new_y] = arr[x, y]
    return centered_arr

arr = np.array([[0, 1, 2, 0, 2, 1, 0, 0, 0],
                [2, 1, 1, 0, 0, 0, 0, 0, 0],
                [0, 0, 1, 0, 0, 2, 0, 0, 0]])
print(centralise(arr))

产量

[[0 0 1 2 0 2 1 0 0]
 [0 0 0 2 1 1 0 0 0]
 [0 0 1 0 0 2 0 0 0]]

比较原始代码以集中的基准：

def orig(a):
    cd = []
    (x, y) = np.shape(a)
    for row in a:
        trim = np.trim_zeros(row)
        to_add = y - np.size(trim)
        a = to_add / 2
        b = to_add - a
        cd.append(np.pad(trim, (a, b), 'constant', constant_values=(0, 0)).tolist())
    result = np.array(cd)
    return result

In [481]: arr = np.tile(arr, (1000, 1))

In [482]: %timeit orig(arr)
10 loops, best of 3: 140 ms per loop

In [483]: %timeit centralise(arr)
1000 loops, best of 3: 537 µs per loop

In [486]: (orig(arr) == centralise(arr)).all()
Out[486]: True

Answer 2

如果你的数组中只有10 ^ 3行，如果你想要一个更明确的解决方案，你可以负担得起一个python循环：

import numpy as np

a = np.array([[0, 1, 2, 0, 2, 1, 0, 0, 0],
              [2, 1, 1, 0, 0, 0, 0, 0, 0],
              [0, 0, 1, 0, 0, 2, 0, 0, 0]])

for i, r in enumerate(a):
    w = np.where(r!=0)[0]
    nend = len(r) - w[-1] - 1
    nstart = w[0]
    shift = (nend - nstart)//2
    a[i] = np.roll(r, shift)

print(a)

给出：

[[0 0 1 2 0 2 1 0 0]
 [0 0 0 2 1 1 0 0 0]
 [0 0 1 0 0 2 0 0 0]]

Answer 3

使用np.apply_along_axis的解决方案：

import numpy as np

def centerRow(a):
  i = np.nonzero(a <> 0)
  ifirst = i[0][0]
  ilast = i[0][-1]
  count = ilast-ifirst+1
  padleft = (np.size(a) - count) / 2
  padright = np.size(a) - padleft - count
  b = np.r_ [ np.repeat(0,padleft), a[ifirst:ilast+1], np.repeat(0,padright) ]
  return b

arr = np.array(
[[0, 1, 2, 0, 2, 1, 0, 0, 0],
 [2, 1, 1, 0, 0, 0, 0, 0, 0],
 [0, 0, 1, 0, 0, 2, 0, 0, 0]]
  )

barr = np.apply_along_axis(centerRow, 1, arr)
print barr

Answer 4

<强>算法：

在长度为n
找到第1个和最后一个非零元素之间的差异d
将有意义的向量x存储在由d
找到d，d_m的中点，如果是偶数，则获取正确的元素
找到行长的中点n_m，如果是偶数，请选择正确的
从d_m-d中减去n_m，并将x放在长度为n的零行中的此位置
重复所有行

Quick Octave Prototype（即将推出Python版本）：

mat = [[0, 1, 2, 0, 2, 1, 0, 0, 0],
       [2, 1, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 2, 0, 0, 0]];

newMat = zeros(size(mat)); %new matrix to be filled
n = size(mat, 2);

for i = 1:size(mat,1)
    newRow = newMat(i,:);
    nonZeros = find(mat(i,:));

    x = mat(i, nonZeros(1):nonZeros(end));
    d = nonZeros(end)- nonZeros(1);
    d_m = ceil(d/2);
    n_m = ceil(n/2);

    newRow(n_m-d_m:n_m-d_m+d) = x;
    newMat(i,:) = newRow;
end

newMat
> [[0 0 1 2 0 2 1 0 0]
   [0 0 0 2 1 1 0 0 0]
   [0 0 1 0 0 2 0 0 0]]

将数据集中在numpy中

4 个答案: