填充numpy数组的矢量方式

时间:2019-03-01 10:46:12

标签: python arrays numpy vectorization

我有一些二进制字符串foreach,例如foreach ($group_array2 as $group) { //do something } 。我想将其转换为numpy数组s,如果001010,则将其转换为a,否则,转换为a[i] = np.array([[1], [0]])

所以我写了这样的代码:

s[i] == '0'

可以将其重写为矢量形式而无需for循环吗?

我的预期输出如下:

np.array([[0], [1]])

2 个答案:

答案 0 :(得分:5)

方法#1:这是一个带有NumPy char数组-

sa = np.frombuffer(s,dtype='S1')
out = np.where(sa[:,None,None]=='0',[[1],[0]],[[0],[1]])

方法2::单线方式-

((np.frombuffer(s,dtype=np.uint8)[:,None]==[48,49])[...,None]).astype(float)

方法3::最后一个完全专注于效果-

a = np.zeros([len(s), 2, 1])
idx = np.frombuffer(s,dtype=np.uint8)-48
a[np.arange(len(idx)),idx] = 1

100000个字符的字符串的计时-

In [2]: np.random.seed(0)

In [3]: s = ''.join(map(str,np.random.randint(0,2,(100000)).tolist()))

# @yatu's soln
In [4]: %%timeit
     ...: a = np.array(list(s), dtype=int)
     ...: np.where(a==0, np.array([[1], [0]]), np.array([[0], [1]])).T[:,:,None]
10 loops, best of 3: 36.3 ms per loop

# App#1 from this post    
In [5]: %%timeit
     ...: sa = np.frombuffer(s,dtype='S1')
     ...: out = np.where(sa[:,None,None]=='0',[[1],[0]],[[0],[1]])
100 loops, best of 3: 3.56 ms per loop

# App#2 from this post    
In [6]: %timeit ((np.frombuffer(s,dtype=np.uint8)[:,None]==[48,49])[...,None]).astype(float)
1000 loops, best of 3: 1.81 ms per loop

# App#3 from this post    
In [7]: %%timeit
     ...: a = np.zeros([len(s), 2, 1])
     ...: idx = np.frombuffer(s,dtype=np.uint8)-48
     ...: a[np.arange(len(idx)),idx] = 1
1000 loops, best of 3: 1.81 ms per loop

答案 1 :(得分:3)

一种简单的方法是从字符串中创建一个list,然后通过指定np.array将此列表转换为一个dtype=int整数:

s = '001010'

a = np.array(list(s), dtype=int)
# array([0, 0, 1, 0, 1, 0])

然后使用np.where以便根据np.array([[1], [0]])中的值在np.array([[0], [1]])a中进行选择:

np.where(a==0, np.array([[1], [0]]), np.array([[0], [1]])).T[:,:,None]
array([[[1],
        [0]],

       [[1],
        [0]],

       [[0],
        [1]],

       [[1],
        [0]],

       [[0],
        [1]],

       [[1],
        [0]]])
相关问题