如何加速一个热编码器代码

时间:2017-02-19 22:28:27

标签: python performance numpy

我做了一个简单的函数,当输入一个向量时,它会返回一个热编码矩阵输出。

import numpy as np

def ohc(x):
    u = list(set(x))
    c = len(u)
    X = np.zeros((len(x), c))
    for idx, val in enumerate(x):
        for i in range(c):
            if val == u[i]:
                X[idx, i] = 1
    return X

inputx = np.random.randint(1, 4, 1000000)
ohc(inputx) 
Out[2]: 
array([[ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       ..., 
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.]])

但我想知道是否因为这两个for循环有什么方法可以加速它?

     1000006 function calls in 1.102 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.930    0.930    1.102    1.102 <ipython-input-32-fcf6d323f906>:1(ohc)
    1    0.000    0.000    1.102    1.102 <string>:1(<module>)
    2    0.000    0.000    0.000    0.000 {len}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    1    0.000    0.000    0.000    0.000 {numpy.core.multiarray.zeros}
  1000000    0.172    0.000    0.172    0.000 {range}

3 个答案:

答案 0 :(得分:4)

这是一种矢量化方法,仅使用np.unique中的唯一值与原始数组进行比较,以获得单热编码数组 -

(inputx[:,None] == np.unique(inputx)).astype(float)

运行时测试

其他方法 -

# Original soln
def ohc(x):
    u = list(set(x))
    c = len(u)
    X = np.zeros((len(x), c))
    for idx, val in enumerate(x):
        for i in range(c):
            if val == u[i]:
                X[idx, i] = 1
    return X

# @Tommalla's soln
def ohc_dict(x):
    elem_to_idx = {}
    for e in x:
        if e not in elem_to_idx:
            elem_to_idx[e] = len(elem_to_idx)
    c = len(elem_to_idx)
    X = np.zeros((len(x), c))
    for idx, val in enumerate(x):
        X[idx, elem_to_idx[val]] = 1
    return X

# @Paul Panzer's soln   
def unique_inverse(x):
    uniq, inv = np.unique(x, return_inverse=True)
    result = np.zeros((len(x), len(uniq)), dtype=int)
    result[np.arange(len(x)), inv] = 1
    return result

计时 -

In [42]: inputx = np.random.randint(1, 4, 1000000)

In [43]: %timeit ohc(inputx)
1 loops, best of 3: 526 ms per loop

In [44]: %timeit ohc_dict(inputx)
1 loops, best of 3: 256 ms per loop

In [45]: %timeit unique_inverse(inputx)
10 loops, best of 3: 48.6 ms per loop

In [46]: %timeit (inputx[:,None] == np.unique(inputx)).astype(float)
10 loops, best of 3: 34.4 ms per loop

进一步提升绩效 -

使用np.int8作为输出dtype,以便使用建议的方法进一步提升性能 -

In [58]: %timeit (inputx[:,None] == np.unique(inputx)).astype(np.int8)
10 loops, best of 3: 27.7 ms per loop

正如@Paul Panzer建议的那样,我们也可以使用view代替类型转换来进一步提升具有更多唯一数字的数组 -

In [23]: inputx = np.random.randint(1, 40, 1000000)

In [24]: %timeit (inputx[:,None] == np.unique(inputx)).astype(np.int8)
10 loops, best of 3: 98.4 ms per loop

In [25]: %timeit (inputx[:,None] == np.unique(inputx)).view(np.int8)
10 loops, best of 3: 92.5 ms per loop

答案 1 :(得分:3)

看起来像np.unique

的工作
uniq, inv = np.unique(x, return_inverse=True)
result = np.zeros((len(x), len(uniq)), dtype=int)
result[np.arange(len(x)), inv] = 1

回应@Divakar的基准测试:这是一个更具信息性的比较,确认dv在小字母表上有轻微的速度优势,小字母表跨越K=20并反转成几个优势pp K=1000的{​​{1}}。这是预期的,因为pp利用了一热的稀疏性。下面, K 是字母表的大小, N 是样本的长度。

import numpy as np
from timeit import timeit

def pp(x):
    uniq, inv = np.unique(x, return_inverse=True)
    result = np.zeros((len(x), len(uniq)), dtype=int)
    result[np.arange(len(x)), inv] = 1

def dv(x):
    (x[:,None] == np.unique(x)).astype(int)


for K in (4, 10, 20, 40, 100, 200, 1000):
    tpp, tdv = [], []
    print('@ K =', K)
    for N in (1000, 10000, 100000):
        data = np.random.choice(np.random.random(K), N, replace=True)
        tdv.append(timeit('f(a)', number=100, globals={'f': dv, 'a': data}))
        tpp.append(timeit('f(a)', number=100, globals={'f': pp, 'a': data}))
    print('dv:', '{:.6f}, {:.6f}, {:.6f}'.format(*tdv), 'secs for 100 trials @ N = 1000, 10000, 100000')
    print('pp:', '{:.6f}, {:.6f}, {:.6f}'.format(*tpp), 'secs for 100 trials @ N = 1000, 10000, 100000')

打印:

@ K = 4
dv: 0.003458, 0.038176, 0.421894 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.004856, 0.052298, 0.603758 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 10
dv: 0.005136, 0.056491, 0.663157 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.005955, 0.054069, 0.719152 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 20
dv: 0.007201, 0.084867, 0.988886 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.007638, 0.084580, 0.891122 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 40
dv: 0.010748, 0.130974, 1.498022 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.009321, 0.103912, 1.080271 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 100
dv: 0.025357, 0.292930, 2.946326 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.011916, 0.147117, 1.641588 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 200
dv: 0.033651, 0.560753, 6.042001 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.022971, 0.221142, 3.580255 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 1000
dv: 0.156715, 2.655647, 37.112166 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.055516, 0.920938, 10.358050 secs for 100 trials @ N = 1000, 10000, 100000

使用uint8并允许@Divakar的方法使用更便宜的视图投射:

@ K = 4
dv: 0.003092, 0.038149, 0.386140 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.004392, 0.043327, 0.554253 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 10
dv: 0.004604, 0.054215, 0.501708 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.004930, 0.051555, 0.607239 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 20
dv: 0.006421, 0.067397, 0.665465 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.006616, 0.054055, 0.703260 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 40
dv: 0.008857, 0.087155, 0.862316 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.006945, 0.060408, 0.733966 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 100
dv: 0.015660, 0.142464, 1.426929 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.008063, 0.070860, 0.908615 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 200
dv: 0.025631, 0.235712, 2.401750 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.008805, 0.101772, 1.111652 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 1000
dv: 0.069953, 1.024585, 11.313402 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.011558, 0.182684, 2.201837 secs for 100 trials @ N = 1000, 10000, 100000

答案 2 :(得分:1)

您的代码在O中运行(n [因为set()] + nc [因为for循环] )。在大多数实际应用中,无论如何都会得到O(nc)*,因为你需要为数组分配空间。然而,有一些技巧可以提高效率:

  1. 使用词典。使用散列实现Dicts,这应该平均需要一段时间。
  2. 不要在每一步迭代c个可能的功能,而是记住每个功能的索引。
  3. 这是我的实施:

    import numpy as np
    
    def ohc(x):
        elem_to_idx = {}
        for e in x:
            if e not in elem_to_idx:
                elem_to_idx[e] = len(elem_to_idx)
        c = len(elem_to_idx)
        X = np.zeros((len(x), c))
        for idx, val in enumerate(x):
            X[idx, elem_to_idx[val]] = 1
        return X
    

    *取决于你打算用X矩阵做什么,你可能想要使用numpy.sparse矩阵,它不会分配那么多内存,反过来可以让你的代码在O(n)中运行O(nc)