当在tensorflow gpu中训练卷积神经网络时,“Python已停止工作”

时间:2017-07-08 21:11:09

标签: python tensorflow tensorflow-gpu

这是我用于测试卷积神经网络的张量流代码,只有1个卷积和池化层,只有1个完全连接的512个神经元层。

我的数据集只有2张图片:http://imgur.com/et1Sn1khttp://imgur.com/ZWxOGgO

当我训练我的网络时,窗口弹出“Python已停止”(ss:http://imgur.com/tc5jWlA

这是我的代码:

from scipy.misc import imread
import matplotlib.pyplot as plt
import numpy as np
import PIL.Image as Image

base_image = imread("base_image.jpg")
subject_image = imread("subject_image.jpg")

base_image = np.resize(base_image, [1024, 768, 3])
subject_image = np.resize(subject_image, [1024, 768, 3])

images = []

images.append(base_image)
images.append(subject_image)

# hyper parameters

epochs = 10
batch_size = 2
learning_rate = 0.01
n_classes = 2
import tensorflow as tf

# Model

x = tf.placeholder('float', [2, 1024, 768, 3])
y = tf.placeholder('float', [2])

weights = {
            "conv": tf.random_normal([10, 10, 3, 32]),
            "fc": tf.random_normal([-1, 512]), #7*7*64
            "out": tf.Variable(tf.random_normal([512, n_classes]))
}

conv = tf.nn.conv2d(x, filter=weights['conv'], strides=[1, 1, 1, 1], padding="SAME")
conv = tf.nn.max_pool(value=conv, ksize=[1,2,2,1], strides=[1,2,2,1], padding="SAME")

fc = tf.reshape(conv, shape=[2,-1])
fc = tf.nn.relu(tf.matmul(fc, weights['fc']))

output = tf.matmul(fc, weights['out'])

loss = tf.reduce_mean((output - y)**2)

train = tf.train.AdamOptimizer(learning_rate).minimize(loss)

sess = tf.Session()

sess.run(tf.global_variables_initializer())

for i in range(epochs):
    sess.run(train, feed_dict={x: images, y: [0, 1]})
    print(i)

输出:

2017-07-09 02:29:43.688699: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.689131: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.689504: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.689998: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.690380: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.690646: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.691117: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:43.691436: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-09 02:29:44.197766: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 960M
major: 5 minor: 0 memoryClockRate (GHz) 1.176
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.35GiB
2017-07-09 02:29:44.198207: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:961] DMA: 0 
2017-07-09 02:29:44.198391: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0:   Y 
2017-07-09 02:29:44.198643: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0)
2017-07-09 02:29:44.881448: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Invalid argument: Dimension -1 must be >= 0
2017-07-09 02:29:44.881875: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Invalid argument: Dimension -1 must be >= 0
     [[Node: random_normal_1/RandomStandardNormal = RandomStandardNormal[T=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/gpu:0"](random_normal_1/shape)]]
2017-07-09 02:29:44.882604: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Invalid argument: Dimension -1 must be >= 0
     [[Node: random_normal_1/RandomStandardNormal = RandomStandardNormal[T=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0, _device="/job:localhost/replica:0/task:0/gpu:0"](random_normal_1/shape)]]
2017-07-09 02:29:45.362917: E c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\stream_executor\cuda\cuda_dnn.cc:352] Loaded runtime CuDNN library: 6021 (compatibility version 6000) but source was compiled with 5105 (compatibility version 5100).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2017-07-09 02:29:45.364154: F c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\kernels\conv_ops.cc:671] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
[Finished in 22.7s with exit code 3221226505]
[shell_cmd: python -u "E:\workspace_py\convolotion_neural_net.py"]
[dir: E:\workspace_py]
[path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\libnvvp;E:\Program Files\Python 3.5\Scripts\;E:\Program Files\Python 3.5\;C:\ProgramData\Oracle\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Python27\Scripts;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;E:\Program Files\MATLAB\runtime\win64;E:\Program Files\MATLAB\bin;E:\Program Files\MATLAB\polyspace\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\ProgramData\Oracle\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Python27\Scripts;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;E:\Program Files\MATLAB\runtime\win64;E:\Program Files\MATLAB\bin;E:\Program Files\MATLAB\polyspace\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\Users\guita\AppData\Local\Microsoft\WindowsApps;E:\Program Files\Python27\Scipts;e:\Program Files (x86)\Microsoft VS Code\bin;C:\Users\guita\AppData\Local\atom\bin]

我的电脑规格(联想Y50): Nvidia GTX 960m 4 GB内存, 英特尔I7第四代, 8 GB RAM

Python 3.5 + Tensorflow with GPU

2 个答案:

答案 0 :(得分:1)

所以我设法自己解决了这个问题。发生这种情况是因为我的cuDNN版本是6.0但是tensorflow效果最好5.1。此外,代码中存在一些细微的错误,但这些错误无关紧要:p

答案 1 :(得分:0)

你的描述'闻起来'缺乏资源,很可能是记忆。

如果每次运行代码时都会发生这种情况,那么问题可能在您的代码中 我建议您使用memory_profilerheapy等工具来分析其内存消耗情况。

如果偶尔发生这种情况,那么与Python进程同时运行的其他进程会占用大量资源,而这些资源还不足以用于Python进程。