TensorFlow:InternalError:Blas SGEMM启动失败

时间:2016-05-20 04:00:46

标签: tensorflow blas

当我运行sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})时,我得到InternalError: Blas SGEMM launch failed。这是完整的错误和堆栈跟踪:

InternalErrorTraceback (most recent call last)
<ipython-input-9-a3261a02bdce> in <module>()
      1 batch_xs, batch_ys = mnist.train.next_batch(100)
----> 2 sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
    338     try:
    339       result = self._run(None, fetches, feed_dict, options_ptr,
--> 340                          run_metadata_ptr)
    341       if run_metadata:
    342         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
    562     try:
    563       results = self._do_run(handle, target_list, unique_fetches,
--> 564                              feed_dict_string, options, run_metadata)
    565     finally:
    566       # The movers are no longer used. Delete them.

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
    635     if handle is None:
    636       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
--> 637                            target_list, options, run_metadata)
    638     else:
    639       return self._do_call(_prun_fn, self._session, handle, feed_dict,

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
    657       # pylint: disable=protected-access
    658       raise errors._make_specific_exception(node_def, op, error_message,
--> 659                                             e.code)
    660       # pylint: enable=protected-access
    661 

InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 10), m=100, n=10, k=784
     [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_4, Variable/read)]]
Caused by op u'MatMul', defined at:
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/lib/python2.7/dist-packages/ipykernel/__main__.py", line 3, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python2.7/dist-packages/traitlets/config/application.py", line 596, in launch_instance
    app.start()
  File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelapp.py", line 442, in start
    ioloop.IOLoop.instance().start()
  File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/ioloop.py", line 162, in start
    super(ZMQIOLoop, self).start()
  File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 883, in start
    handler_func(fd_obj, events)
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 276, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
    handler(stream, idents, msg)
  File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 391, in execute_request
    user_expressions, allow_stdin)
  File "/usr/local/lib/python2.7/dist-packages/ipykernel/ipkernel.py", line 199, in do_execute
    shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2723, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2825, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2885, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-d7414c4b6213>", line 4, in <module>
    y = tf.nn.softmax(tf.matmul(x, W) + b)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1036, in matmul
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 911, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
    self._traceback = _extract_stack()

堆栈:EC2 g2.8xlarge机器,Ubuntu 14.04

16 个答案:

答案 0 :(得分:93)

老问题,但可能会帮助别人。
尝试关闭在其他进程中活动的交互式会话(如果是IPython Notebook - 只需重新启动内核)。这对我有帮助!

另外,我在实验期间使用此代码关闭此内核中的本地会话:

if 'session' in locals() and session is not None:
    print('Close interactive session')
    session.close()

答案 1 :(得分:6)

我遇到了这个问题并通过设置2017-03-30 17:59:44,853 [main] DEBUG CH step (2), time spent (246), score (0.0hard/-34.78590672782874soft), selected move count (1), picked move (CollectionInfo-72 {null -> ManagerInfo-0}). Soft Score: 4.969724770642201 ... Soft Score: 4.969724770642201 Soft Score: 4.966641437308868 Soft Score: 4.969724770642201 ... Soft Score: 4.969724770642201 Soft Score: 4.967558103975534 Soft Score: 4.969724770642201 ... Soft Score: 4.969724770642201 Soft Score: 4.967558103975534 Exception in thread "main" java.lang.IllegalStateException: Score corruption: the workingScore (0.0hard/-34.78590672782873soft) is not the uncorruptedScore (0.0hard/-34.785906727828745soft) after completedAction (Undo(CollectionInfo-39 {null -> ManagerInfo-0})): The corrupted scoreDirector has no ConstraintMatch(s) which are in excess. The corrupted scoreDirector has no ConstraintMatch(s) which are missing. The corrupted scoreDirector has no ConstraintMatch(s) in excess or missing. That could be a bug in this class (class org.optaplanner.core.impl.score.director.drools.DroolsScoreDirector). Check your score constraints. at org.optaplanner.core.impl.score.director.AbstractScoreDirector.assertWorkingScoreFromScratch(AbstractScoreDirector.java:378) at org.optaplanner.core.impl.phase.scope.AbstractPhaseScope.assertExpectedUndoMoveScore(AbstractPhaseScope.java:142) at org.optaplanner.core.impl.constructionheuristic.decider.ConstructionHeuristicDecider.doMove(ConstructionHeuristicDecider.java:124) at org.optaplanner.core.impl.constructionheuristic.decider.ConstructionHeuristicDecider.decideNextStep(ConstructionHeuristicDecider.java:93) at org.optaplanner.core.impl.constructionheuristic.DefaultConstructionHeuristicPhase.solve(DefaultConstructionHeuristicPhase.java:72) at org.optaplanner.core.impl.solver.DefaultSolver.runPhases(DefaultSolver.java:215) at org.optaplanner.core.impl.solver.DefaultSolver.solve(DefaultSolver.java:176) at org.optaplanner.examples.collectionarrange2.app.CollectionArrangeHelloWorld.main(CollectionArrangeHelloWorld.java:51) display.setStatusBar(display.HiddenStatusBar) centerX=display.contentCenterX centerY=display.contentCenterY screenX=display.screenOriginX screenY=display.screenOriginY screenWidth=display.contentWidth-screenX * 1 screenHeight=display.contentHeight - screenY *1 screenLeft=screenX screenRight=screenX + screenWidth screenTop=screenY screenBottom=screenY+screenHeight display.contentWidht=screenWidth display.contentHeight=screenHeight display.cl=display.CenterLeftReferencePoint local tileImg="images/lolo.png" local hiddenObjects={ "cubo", "abeja", "mariposa", "flor", "cubo", "abeja", "mariposa", "flor"} local tileWidth=100 local tileHeigth=100 local tileAcross=6 local tileDown=4 local tileSpacing=2 local topSpacing=screenTop+tileHeigth+tileSpacing local leftSpacing = screenLeft+tileWidth+tileSpacing local numMatches=0 local numObjsShowing=0 local flipped={} local pauseDelay=800 local score=0 local scoreTxt=nil local allTiles={} local allThings={} local resetGame local function shuffle(t) local n= #t while n > 2 do local k = math.random(n) t[n], t[k] = t[k], t[n] n = n-1 end return t end local function killObj( obj) display.remove( obj ) obj = nil -- body end local function startOver() local msg local function start(event) killObj(event.target) resetGame() -- body end msg=display.newText("Tap Here To Start", 0, 0, "Helvetica", 24) msg.x=centerX msg.y=centerY+250 msg:addEventListener("tap", start) end local function addToScore( addNum ) local num = addNum or 100 score = score +num scoreTxt:setReferencePoint(display.cl) scoreTxt.x = screenWidth+250 -- body end local function checkForMatch() if #flipped== 2 then local idx1,idx2=flipped[1],flipped[2] local function resetNumObjsShowing() numObjsShowing=0 end if allThings[idx1].name==allThings[idx2].name then audio.play(sndMatch) allThings[idx1]:toFront() allThings[idx2]:toFront() transition.to(allThings[idx1],{time=400, x=screenRight, y=screenTop, alpha=0}) transition.to(allThings[idx2],{time=400, x=screenRight, y=screenTop, alpha=0}) addToScore(100) resetNumObjsShowing() numMatches=numMatches+1 if numMatches==(tileAcross * tileDown / 2) then audio.play(sndWinner) startOver() end else audio.play(sndNoMatch) transition.to(allTiles[idx1],{delay=pauseDelay, time=200,alpha=1, onComplete=resetNumObjsShowing}) transition.to(allTiles[idx2],{delay=pauseDelay, time=200,alpha=1,}) if score> 20 then addToScore(-20) end end flipped[1]=nil flipped[2]=nil end end local function tileTapped(event) if numObjsShowing< 2 then local tile=event.target if flipped[numObjsShowing]~=tile.idx then numObjsShowing=numObjsShowing+1 flipped[numObjsShowing]=tile.idx transition.to(tile,{time=500,alpha=0,onComplete=checkForMatch}) end end end local function makeTiles(things) local idx=1 for x = 1, tileAcross do for y = 1, tileDown do local thing = display.newImage("images/" ..things[idx].. ".png") thing.x=(x+1.5)*(tileWidth + tileSpacing)+ leftSpacing thing.y=(y+0.9)*(tileHeigth + tileSpacing)+ topSpacing thing.name=things[idx] allThings[#allThings+1]=thing local tile = display.newImage(tileImg) tile.x =(x+1) * (tileWidth + tileSpacing)+ leftSpacing tile.y =(y+1) * (tileHeigth + tileSpacing)+ topSpacing tile.idx=idx tile:addEventListener("tap", tileTapped) allTiles[#allTiles+1]=tile idx=idx+1 end end end function resetGame() numMatches= 0 numObjsShowing = 0 score = 0 scoreTxt.text ="score: 000" flipped[1]=nil flipped[2]=nil if allThings ~={} then for x=#allThings, 1, -1 do killObj(allThings[x]) end allThings={} -- body end if allTiles ~={} then for x=#allTiles,1,-1 do killObj(allTiles[x]) end allTiles={} end makeTiles(shuffle(hiddenObjects)) end local function setupDisplay() local bg=display.newImage("images/paisaje1.jpg") bg.x=centerX bg.y=centerY bg.width=screenWidth bg.height=screenHeight scoreTxt = display.newText("Score: 000", 0, 0, "Helvetica", 18) scoreTxt.x=screenWidth-120 scoreTxt.y=screenTop+15 end setupDisplay() startOver() 解决了这个问题,allow_soft_placement=Truegpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)专门定义了GPU使用的内存部分。我想这有助于避免两个张量流程争夺GPU内存。

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(
  allow_soft_placement=True, log_device_placement=True))

答案 2 :(得分:4)

运行Tensorflow Distributed时出现此错误。您是否检查过任何工作人员是否报告了CUDA_OUT_OF_MEMORY错误?如果是这种情况,则可能与放置体重和偏差变量的位置有关。 E.g。

with tf.device("/job:paramserver/task:0/cpu:0"):
   W = weight_variable([input_units, num_hidden_units])       
   b = bias_variable([num_hidden_units])             

答案 3 :(得分:4)

我的环境是Python 3.5,Tensorflow 0.12和Windows 10(没有Docker)。我正在CPU和GPU中训练神经网络。每当在GPU中进行训练时,我都会遇到相同的错误InternalError: Blas SGEMM launch failed

我找不到发生此错误的原因但我设法通过避免tensorflow函数tensorflow.contrib.slim.one_hot_encoding()在GPU中运行我的代码。相反,我在numpy(输入和输出变量)中进行单热编码操作。

以下代码重现错误和修复。使用渐变下降来学习y = x ** 2函数是一个最小的设置。

import numpy as np
import tensorflow as tf
import tensorflow.contrib.slim as slim

def test_one_hot_encoding_using_tf():

    # This function raises the "InternalError: Blas SGEMM launch failed" when run in the GPU

    # Initialize
    tf.reset_default_graph()
    input_size = 10
    output_size = 100
    input_holder = tf.placeholder(shape=[1], dtype=tf.int32, name='input')
    output_holder = tf.placeholder(shape=[1], dtype=tf.int32, name='output')

    # Define network
    input_oh = slim.one_hot_encoding(input_holder, input_size)
    output_oh = slim.one_hot_encoding(output_holder, output_size)
    W1 = tf.Variable(tf.random_uniform([input_size, output_size], 0, 0.01))
    output_v = tf.matmul(input_oh, W1)
    output_v = tf.reshape(output_v, [-1])

    # Define updates
    loss = tf.reduce_sum(tf.square(output_oh - output_v))
    trainer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
    update_model = trainer.minimize(loss)

    # Optimize
    init = tf.initialize_all_variables()
    steps = 1000

    # Force CPU/GPU
    config = tf.ConfigProto(
        # device_count={'GPU': 0}  # uncomment this line to force CPU
    )

    # Launch the tensorflow graph
    with tf.Session(config=config) as sess:
        sess.run(init)

        for step_i in range(steps):

            # Get sample
            x = np.random.randint(0, 10)
            y = np.power(x, 2).astype('int32')

            # Update
            _, l = sess.run([update_model, loss], feed_dict={input_holder: [x], output_holder: [y]})

        # Check model
        print('Final loss: %f' % l)

def test_one_hot_encoding_no_tf():

    # This function does not raise the "InternalError: Blas SGEMM launch failed" when run in the GPU

    def oh_encoding(label, num_classes):
        return np.identity(num_classes)[label:label + 1].astype('int32')

    # Initialize
    tf.reset_default_graph()
    input_size = 10
    output_size = 100
    input_holder = tf.placeholder(shape=[1, input_size], dtype=tf.float32, name='input')
    output_holder = tf.placeholder(shape=[1, output_size], dtype=tf.float32, name='output')

    # Define network
    W1 = tf.Variable(tf.random_uniform([input_size, output_size], 0, 0.01))
    output_v = tf.matmul(input_holder, W1)
    output_v = tf.reshape(output_v, [-1])

    # Define updates
    loss = tf.reduce_sum(tf.square(output_holder - output_v))
    trainer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
    update_model = trainer.minimize(loss)

    # Optimize
    init = tf.initialize_all_variables()
    steps = 1000

    # Force CPU/GPU
    config = tf.ConfigProto(
        # device_count={'GPU': 0}  # uncomment this line to force CPU
    )

    # Launch the tensorflow graph
    with tf.Session(config=config) as sess:
        sess.run(init)

        for step_i in range(steps):

            # Get sample
            x = np.random.randint(0, 10)
            y = np.power(x, 2).astype('int32')

            # One hot encoding
            x = oh_encoding(x, 10)
            y = oh_encoding(y, 100)

            # Update
            _, l = sess.run([update_model, loss], feed_dict={input_holder: x, output_holder: y})

        # Check model
        print('Final loss: %f' % l)

答案 4 :(得分:3)

也许你没有充分释放你的gpu,如果你正在使用linux,请尝试&#34; ps -ef | grep python&#34;看看哪些工作正在使用GPU。然后杀了他们

答案 5 :(得分:2)

在我的情况下,我打开了2个python控制台,都使用了keras / tensorflow。 当我关闭旧控制台(从前一天忘记), 一切都开始正常工作。

如果没有多个控制台/进程占用GPU,那么检查是很好的。

答案 6 :(得分:1)

我关闭了所有其他Jupyter会话,这解决了问题。我认为这是GPU内存问题。

答案 7 :(得分:1)

对我来说,当我尝试运行多个tensorflow进程(例如2个)并且它们都需要访问GPU资源时,我遇到了这个问题。

一个简单的解决方案是确保一次仅运行一个tensorflow进程。

有关更多详细信息,请参见here

  

为清楚起见,tensorflow将尝试(默认情况下)消耗所有可用的   GPU。它不能与其他也处于活动状态的程序一起运行。闭幕。感觉   如果确实是另一个问题,可以免费重新打开。

答案 8 :(得分:1)

就我而言,

首先,我运行

  

conda clean --all

清理压缩包和未使用的软件包。

然后,我重新启动IDE(在这种情况下为Pycharm),它运行良好。环境:Anaconda python 3.6,Windows 10 64bit。我通过anaconda网站上提供的命令安装tensorflow-gpu。

答案 9 :(得分:1)

2.0兼容答案:为erko的答案提供2.0代码,以使社区受益。

session = tf.compat.v1.Session()

if 'session' in locals() and session is not None:
    print('Close interactive session')
    session.close()

答案 10 :(得分:0)

与pytest-xdist并行运行Keras CuDNN测试时遇到此错误。解决方案是连续运行它们。

答案 11 :(得分:0)

对我来说,使用Keras时出现此错误,而Tensorflow是后端。这是因为Anaconda的深度学习环境没有被正确激活,因此,Tensorflow也没有正确启动。自上次激活我的深度学习环境(称为dl)以来,我注意到了这一点,我的Anaconda提示符中的提示已更改为:

(dl) C:\Users\georg\Anaconda3\envs\dl\etc\conda\activate.d>set "KERAS_BACKEND=tensorflow"

虽然之前它只有dl。因此,我为摆脱上述错误所做的就是关闭我的jupyter笔记本和Anaconda提示,然后重新启动几次。

答案 12 :(得分:0)

我最近在将操作系统更改为 Windows 10 后遇到此错误,而且在使用Windows 7之前我从未遇到过此错误。

如果我在另一个GPU程序运行时加载我的GPU Tensorflow模型,则会发生错误;这是我的JCuda模型作为套接字服务器加载,不是很大。如果我关闭其他GPU程序,可以非常成功地加载此Tensorflow模型。

这个JCuda程序根本不大,只有大约70M,相比之下,这个Tensorflow模型超过500M且更大。但我使用的是1080 ti,它有很多内存。所以它可能不是一个内存不足的问题,而且可能是关于OS或Cuda的Tensorflow的一些棘手的内部问题。 (PS:我使用的是Cuda版本8.0.44,并且没有下载更新的版本。)

答案 13 :(得分:0)

重启我的Jupyter流程还不够;我不得不重新启动计算机。

答案 14 :(得分:0)

就我而言,在单独的服务器中打开Jupyter Notebooks就足够了。

仅当我尝试在同一服务器上使用多个Tensorflow / Keras模型时,我才会出现此错误。打开一个笔记本,执行它,然后关闭并尝试打开另一个笔记本,都没有关系。如果将它们加载到同一Jupyter服务器中,则始终会发生错误。

答案 15 :(得分:-1)

就我而言,libcublas.so所在的网络文件系统已经死亡。节点重新启动,一切都很好。只是为数据集添加另一个点。