Theano:新Jetson TX1上的GPU使用情况

时间:2016-09-17 22:57:34

标签: theano theano-cuda

(下面问题中的很长的错误消息.TL; DR,这是具体的问题:为什么这个测试代码不能在TX1的GPU上执行,我需要做些什么才能让它这样做吗

我刚刚使用JetPack 2.3闪存并安装了全新的Nvidia Jetson TX1。我试图让Theano安装在TX1上,以便能够使用板载GPU,进一步的机器学习和神经网络应用。

但是,我似乎无法让GPU本身工作。

Theano的安装取自here

<tr>
    <td><input type="hidden" data-val="true" data-val-required="The BlogID field is required." id="BlogID" name="BlogID" value="1" /></td>
    <td><input type="hidden" data-val="true" data-val-required="The PostID field is required." id="PostID" name="PostID" value="1" /></td>
    <td>
        <input type="text" style="border:0;" readonly id="Url" name="Url" value="blog1@test.com" />
    </td>
    <td>
        <input type="text" id="Title" name="Title" value="Title1" />
    </td>
    <td>
        <input type="text" id="Content" name="Content" value="Content1" />
    </td>
</tr>

安装的Theano版本是0.9.0.dev2,python是版本2.7.12。

我使用了来自here的测试脚本:

sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libblas-dev git
pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git --user  # Need Theano 0.8(not yet released) or more recent

按照建议运行时:

from theano import function, config, shared, tensor
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

我得到以下响应,充满了错误,警告以及在CPU而不是GPU上的执行:

THEANO_FLAGS=device=cuda0 python gpu_tutorial1.py

当我将设备标志更改为'gpu'时:

ERROR (theano.gpuarray): pygpu was configured but could not be imported
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 21, in <module>
    import pygpu
ImportError: No module named pygpu
WARNING (theano.gof.cmodule): OPTIMIZATION WARNING: Theano was not able to find the default g++ parameters. This is needed to tune the compilation to your specific CPU. This can slow down the execution of Theano functions. Please submit the following lines to Theano's mailing list so that we can fix this problem:
 ['# 1 "<stdin>"\n', '# 1 "<built-in>"\n', '# 1 "<command-line>"\n', '# 1 "/usr/include/stdc-predef.h" 1 3 4\n', '# 1 "<command-line>" 2\n', '# 1 "<stdin>"\n', 'Using built-in specs.\n', 'COLLECT_GCC=/usr/bin/g++\n', 'Target: aarch64-linux-gnu\n', "Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-arm64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-arm64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-arm64 --with-arch-directory=aarch64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu\n", 'Thread model: posix\n', 'gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.2) \n', "COLLECT_GCC_OPTIONS='-E' '-v' '-shared-libgcc' '-mlittle-endian' '-mabi=lp64'\n", ' /usr/lib/gcc/aarch64-linux-gnu/5/cc1 -E -quiet -v -imultiarch aarch64-linux-gnu - -mlittle-endian -mabi=lp64 -fstack-protector-strong -Wformat -Wformat-security\n', 'ignoring nonexistent directory "/usr/local/include/aarch64-linux-gnu"\n', 'ignoring nonexistent directory "/usr/lib/gcc/aarch64-linux-gnu/5/../../../../aarch64-linux-gnu/include"\n', '#include "..." search starts here:\n', '#include <...> search starts here:\n', ' /usr/lib/gcc/aarch64-linux-gnu/5/include\n', ' /usr/local/include\n', ' /usr/lib/gcc/aarch64-linux-gnu/5/include-fixed\n', ' /usr/include/aarch64-linux-gnu\n', ' /usr/include\n', 'End of search list.\n', 'COMPILER_PATH=/usr/lib/gcc/aarch64-linux-gnu/5/:/usr/lib/gcc/aarch64-linux-gnu/5/:/usr/lib/gcc/aarch64-linux-gnu/:/usr/lib/gcc/aarch64-linux-gnu/5/:/usr/lib/gcc/aarch64-linux-gnu/\n', 'LIBRARY_PATH=/usr/lib/gcc/aarch64-linux-gnu/5/:/usr/lib/gcc/aarch64-linux-gnu/5/../../../aarch64-linux-gnu/:/usr/lib/gcc/aarch64-linux-gnu/5/../../../../lib/:/lib/aarch64-linux-gnu/:/lib/../lib/:/usr/lib/aarch64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/aarch64-linux-gnu/5/../../../:/lib/:/usr/lib/\n', "COLLECT_GCC_OPTIONS='-E' '-v' '-shared-libgcc' '-mlittle-endian' '-mabi=lp64'\n"]
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 12.736936 seconds
Result is [ 1.23178032  1.61879341  1.52278065 ...,  2.20771815  2.29967753
  1.62323285]
Used the cpu
事情有所改善,因为至少发现了NVIDIA Tegra X1,尽管最终没有使用它:

THEANO_FLAGS=device=gpu python gpu_tutorial1.py

我打算将警告线发送到Theano邮件列表,但是这个警告似乎与我目前的主要问题无关:为什么这个测试代码不能在TX1的GPU上执行,我需要什么要做到这一点?

1 个答案:

答案 0 :(得分:2)

事实证明,该网站上推荐的CLI调用不正确。正确的调用是:

THEANO_FLAGS='device=gpu,floatX=float32' python gpu_tutorial1.py

这足以在GPU上以令人满意的加速(在输出中引人注目并报告)执行并且摆脱那个gob-smackly长错误警告。

将这两个标志放在.theanorc文件中也足够了,并简化了调用。