我的Theano程序实际上是使用GPU吗?

时间:2016-02-28 23:27:05

标签: performance profiling gpu gpgpu theano

Theano声称它正在使用GPU;它说什么设备启动等等。此外nvidia-smi说它正在使用。

但无论我是否使用它,运行时间似乎完全相同。

它可能与整数运算有关吗?

import sys

import numpy as np
import theano
import theano.tensor as T


def ariths(v, ub):
  """Given a sorted vector v and scalar ub, returns multiples of elements in v.

  Specifically, returns a vector containing all numbers j * k < ub where j is in
  v and k >= j.  Some elements may occur more than once in the output.
  """

  lp = v[0]
  v = T.shape_padright(v)
  a = T.shape_padleft(T.arange(0, (ub + lp - 1) // lp - lp, 1, 'int64'))
  res = v * (a + v)
  return res[(res < ub).nonzero()]


def filter_composites(pv, using_primes):
  a = ariths(using_primes, pv.size)
  return T.set_subtensor(pv[a], 0)


def _iterfn(prev_bnds, pv):
  bstart = prev_bnds[0]
  bend = prev_bnds[1]
  use_primes = pv[bstart:bend].nonzero()[0] + bstart
  pv = filter_composites(pv, use_primes)
  return pv


def primes_to(n):
  if n <= 2:
    return np.asarray([])
  elif n <= 3:
    return np.asarray([2])

  res = T.ones(n, 'int8')
  res = T.set_subtensor(res[:2], 0)

  ubs = [[2, 4]]
  ub = 4
  while ub ** 2 < n:
    prevub = ub
    ub *= 2
    ubs.append([prevub, ub])
  (r, u5) = theano.scan(fn=_iterfn,
                        outputs_info=res, sequences=[np.asarray(ubs)])
  return r[-1].nonzero()[0]


def main(n):
  print(primes_to(n).size.eval())

if __name__ == '__main__':
  main(int(sys.argv[1]))

1 个答案:

答案 0 :(得分:3)

答案是肯定的。和不。如果您使用nvprof在支持GPU的Theano安装中分析代码,您将看到如下内容:

==16540== Profiling application: python ./theano_test.py
==16540== Profiling result:
Time(%)      Time     Calls       Avg       Min       Max  Name
 49.22%  12.096us         1  12.096us  12.096us  12.096us  kernel_reduce_ccontig_node_c8d7bd33dfef61705c2854dd1f0cb7ce_0(unsigned int, float const *, float*)
 30.60%  7.5200us         3  2.5060us     832ns  5.7600us  [CUDA memcpy HtoD]
 13.93%  3.4240us         1  3.4240us  3.4240us  3.4240us  [CUDA memset]
  6.25%  1.5350us         1  1.5350us  1.5350us  1.5350us  [CUDA memcpy DtoH]

即。您的GPU上至少执行了reduce操作。但是,如果您像这样修改主体:

def main():
  n = 100000000 
  print(primes_to(n).size.eval())

if __name__ == '__main__':
    import cProfile, pstats
    cProfile.run("main()", "{}.profile".format(__file__))
    s = pstats.Stats("{}.profile".format(__file__))
    s.strip_dirs()
    s.sort_stats("time").print_stats(10)

并使用cProfile来分析您的代码,您将看到如下内容:

Thu Mar 10 14:35:24 2016    ./theano_test.py.profile

         486743 function calls (480590 primitive calls) in 17.444 seconds

   Ordered by: internal time
   List reduced from 1138 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    6.376    6.376   16.655   16.655 {theano.scan_module.scan_perform.perform}
       13    6.168    0.474    6.168    0.474 subtensor.py:2084(perform)
       27    2.910    0.108    2.910    0.108 {method 'nonzero' of 'numpy.ndarray' objects}
       30    0.852    0.028    0.852    0.028 {numpy.core.multiarray.concatenate}
       27    0.711    0.026    0.711    0.026 {method 'astype' of 'numpy.ndarray' objects}
       13    0.072    0.006    0.072    0.006 {numpy.core.multiarray.arange}
        1    0.034    0.034   17.142   17.142 function_module.py:482(__call__)
      387    0.020    0.000    0.052    0.000 graph.py:486(stack_search)
       77    0.016    0.000   10.731    0.139 op.py:767(rval)
      316    0.013    0.000    0.066    0.000 graph.py:715(general_toposort)

最慢的操作(只是)是扫描调用,查看扫描源,你可以看到目前GPU execution of scan is disabled

然后回答是,是的,GPU正用于代码中的某些内容,但不是,最耗时的操作是在CPU上运行,因为GPU执行似乎在代码中被禁用了本。