在numpy中加速矢量化眼动追踪算法

时间:2016-03-14 19:23:54

标签: algorithm performance opencv numpy eye-tracking

我正在尝试在numpy和OpenCV中实现Fabian Timm的眼动追踪算法[http://www.inb.uni-luebeck.de/publikationen/pdfs/TiBa11b.pdf](在这里找到:[http://thume.ca/projects/2012/11/04/simple-accurate-eye-center-tracking-in-opencv/]),我遇到了麻烦。我认为我已经很好地实现了我的实现,但它仍然不够快,无法实时运行,并且它没有像我希望的那样准确地检测学生。这是我第一次使用numpy,所以我不确定我做错了什么。

-rw-rw-r--

以及其余代码:

def find_pupil(eye):
    eye_len = np.arange(eye.shape[0])
    xx,yy = np.meshgrid(eye_len,eye_len) #coordinates
    XX,YY = np.meshgrid(xx.ravel(),yy.ravel()) #all distance vectors
    Dx,Dy = [YY-XX, YY-XX] #y2-y1, x2-x1 -- simpler this way because YY = XXT
    Dlen = np.sqrt(Dx**2+Dy**2)
    Dx,Dy = [Dx/Dlen, Dy/Dlen] #normalized

    Gx,Gy = np.gradient(eye)
    Gmagn = np.sqrt(Gx**2+Gy**2)

    Gx,Gy = [Gx/Gmagn,Gy/Gmagn] #normalized
    GX,GY = np.meshgrid(Gx.ravel(),Gy.ravel())

    X = (GX*Dx+GY*Dy)**2
    eye = cv2.bitwise_not(cv2.GaussianBlur(eye,(5,5),0.005*eye.shape[1])) #inverting and blurring eye for use as w
    eyem = np.repeat(eye.ravel()[np.newaxis,:],eye.size,0)
    C = (np.nansum(eyem*X, axis=0)/eye.size).reshape(eye.shape)

    return np.unravel_index(C.argmax(), C.shape)

1 个答案:

答案 0 :(得分:2)

您可以执行许多保存复制元素的操作,然后在创建允许NumPy broadcasting的单例维度后直接执行数学运算来执行某些数学运算。因此,有两个好处 - 即时运行以节省工作空间内存和性能提升。另外,最后,我们可以用简化版替换nansum计算。因此,考虑到所有这些哲学,这里有一种修改过的方法 -

def find_pupil_v2(face, x, y, w, h):    
    eye = face[x:x+w,y:y+h]
    eye_len = np.arange(eye.shape[0])

    N = eye_len.size**2
    eye_len_diff = eye_len[:,None] - eye_len
    Dlen = np.sqrt(2*((eye_len_diff)**2))
    Dxy0 = eye_len_diff/Dlen 

    Gx0,Gy0 = np.gradient(eye)
    Gmagn = np.sqrt(Gx0**2+Gy0**2)
    Gx,Gy = [Gx0/Gmagn,Gy0/Gmagn] #normalized

    B0 = Gy[:,:,None]*Dxy0[:,None,:]
    C0 = Gx[:,None,:]*Dxy0
    X = ((C0.transpose(1,0,2)[:,None,:,:]+B0[:,:,None,:]).reshape(N,N))**2

    eye1 = cv2.bitwise_not(cv2.GaussianBlur(eye,(5,5),0.005*eye.shape[1]))
    C = (np.nansum(X,0)*eye1.ravel()/eye1.size).reshape(eye1.shape)

    return np.unravel_index(C.argmax(), C.shape)

repeat还剩下一个Dxy。可能可以避免该步骤,Dxy0可以直接进入使用Dxy给我们X的步骤,但我还没有完成它。一切都转换为broadcasting为基础!

运行时测试和输出验证 -

In [539]: # Inputs with random elements
     ...: face = np.random.randint(0,10,(256,256)).astype('uint8')
     ...: x = 40
     ...: y = 60
     ...: w = 64
     ...: h = 64
     ...: 

In [540]: find_pupil(face,x,y,w,h)
Out[540]: (32, 63)

In [541]: find_pupil_v2(face,x,y,w,h)
Out[541]: (32, 63)

In [542]: %timeit find_pupil(face,x,y,w,h)
1 loops, best of 3: 4.15 s per loop

In [543]: %timeit find_pupil_v2(face,x,y,w,h)
1 loops, best of 3: 529 ms per loop

我们似乎已接近 8x 加速!