Question

似乎使用numpy dtypes（特别是uint32）进行数学运算比在常规python int上进行数学运算需要更长的时间。这是我现实生活中的示例代码：

import numpy

## Binary encoding of DNA as python int
bDic = {'A': 0 ,'C': 1 ,'G': 2 ,'T': 3  } # DNA to 32bit binary...
tDic = ['A',    'C',    'G',    'T'     ] # ...and back again :)
range32 = range(0,32,2)

def string_up2bit(string):
    up2bit = 3
    for char in reversed(string): up2bit = (up2bit << 2) + bDic[char]
    return up2bit
def up2bit_string(value):
    up2bits = [((value >> x) & 3) for x in range32]
    return ''.join([tDic[up2bit] for up2bit in up2bits[:-up2bits[::-1].index(3)-1]])

## Binary encoding of DNA as numpy uint32 (what i will actually be saving to disk)
n0,n1,n2,n3 = numpy.uint32(0),numpy.uint32(1),numpy.uint32(2),numpy.uint32(3)
npbDic = { 'A': n0 ,'C': n1 ,'G': n2 ,'T': n3 } # DNA to 32bit binary...
nptDic = { n0 :'A', n1 :'C', n2 :'G', n3 :'T' } # ...and back again :)
nprange32 = list(numpy.arange(0,32,2,dtype='uint32'))

def np_string_up2bit(string):
    up2bit = n3
    for char in reversed(string): up2bit = (up2bit << n2) + npbDic[char]
    return up2bit
def np_up2bit_string(value):
    up2bits = [((value >> x) & n3) for x in nprange32] # The 32 here makes it 32bit only.
    return ''.join([nptDic[up2bit] for up2bit in up2bits[:-up2bits[::-1].index(n3)-1]])

## Begin test:
## Read 10000000 lines of DNA from a file, convert into binary and back again.
DNA = 'ATTCGACTTGACTG'
r = 0
while r != 10000000:
    r += 1
    #up2bit_string(string_up2bit(DNA))        # Takes 1min 12sec
    np_up2bit_string(np_string_up2bit(DNA))   # Takes 1min 45sec

正如您在底部看到的那样，使用numpy uint32比python int版本长45％。在上面的代码中，不应该将NumPy uint32s转换为python int来解释减速，只是使用uint32s似乎更慢。这转化为现实世界数据集上额外计算时间的天数。

有谁知道如何加快速度？也许有一种方法可以将python中的uint32数学作为默认值？也许我应该尝试ctypes而不是numpy dtypes？

已编辑，因此任何人都可以通过提供DNA数据来测试代码。

关于numpy dtypes的数学

0 个答案: