Question

我使用docker在rtx2080上运行了coed。一旦我调用sess.run（train_step，feed_dick = {}），我就会得到“进程完成，退出代码139（信号11：SIGSEGV打断）。”但是，如果我使用cpu运行它，则效果很好。我不知道发生了什么事。

使用TensorFlow后端。

2018-11-18 13：19：12.025412：我 tensorflow / core / platform / cpu_feature_guard.cc：141]您的CPU支持 TensorFlow二进制文件未编译使用的指令：AVX2 FMA 2018-11-18 13：19：12.132999：我 tensorflow / stream_executor / cuda / cuda_gpu_executor.cc：964]成功从SysFS读取的NUMA节点的值为负（-1），但必须存在至少一个NUMA节点，因此返回NUMA节点为零2018-11-18 13：19：12.133566：我 tensorflow / core / common_runtime / gpu / gpu_device.cc：1432]找到设备0 具有属性：名称：GeForce RTX 2080主要：7次要：5 memoryClockRate（GHz）：1.8 pciBusID：0000：06：00.0 totalMemory：7.76GiB freeMemory：7.46GiB 2018-11-18 13：19：12.133584：我 tensorflow / core / common_runtime / gpu / gpu_device.cc：1511]添加可见 gpu设备：0 2018-11-18 13：19：12.394726：I tensorflow / core / common_runtime / gpu / gpu_device.cc：982]设备将StreamExecutor与强度1边缘矩阵互连：2018-11-18 13：19：12.394763：我 tensorflow / core / common_runtime / gpu / gpu_device.cc：988] 0 2018-11-18 13：19：12.394770：我 tensorflow / core / common_runtime / gpu / gpu_device.cc：1001] 0：N 2018-11-18 13：19：12.394963：我 tensorflow / core / common_runtime / gpu / gpu_device.cc：1115]已创建 TensorFlow设备（/ job：localhost /副本：0 /任务：0 /设备：GPU：0与 7172 MB内存）->物理GPU（设备：0，名称：GeForce RTX 2080， pci总线ID：0000：06：00.0，计算能力：7.5）

import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow.contrib.framework import arg_scope
from keras.layers import Dense, Activation
import pickle
from tensorflow.contrib.layers import batch_norm, flatten

train_data = {b'data': [], b'labels': []}
# 加载训练数据
for i in range(5):
    with open("data/cifar-10/data_batch_" + str(i + 1), mode='rb') as file:
        data = pickle.load(file, encoding='bytes')
        train_data[b'data'] += list(data[b'data'])
        train_data[b'labels'] += data[b'labels']
# 加载测试数据
with open("data/cifar-10/test_batch", mode='rb') as file:
    test_data = pickle.load(file, encoding='bytes')
# 定义一些变量
NUM_LABLES = 10  # 分类结果为10类
BATCH_SIZE = 64  # 每次训练batch数

sess = tf.InteractiveSession()


# 权重初始化
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=2 / shape[0] / shape[1] / shape[2])
    #   initial = tf.truncated_normal(shape, stddev=0.01)
    return tf.Variable(initial)


# 卷积层偏置初始化为常数0.1
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


# 定义卷积操作，卷积步长为1. padding = 'SAME' 表示全0填充
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


# 定义最大池化操作，尺寸为2，步长为2，全0填充
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# 对输入进行占位操作，输入为BATCH*3072向量，输出为BATCH*10向量
x = tf.placeholder(tf.float32, [None, 3072])
y_ = tf.placeholder(tf.float32, [None, NUM_LABLES])
# 对输入进行reshape，转换成3*32*32格式
x_image = tf.reshape(x, [-1, 3, 32, 32])
# 转置操作，转换成滤波器做卷积所需格式：32*32*3,32*32为其二维卷积操作维度
x_image = tf.transpose(x_image, [0, 2, 3, 1])

# 第一层卷积，滤波器参数3*3*3, 32个
#bn_layer1 = Batch_Normalization(x_image, istraining, "bn1")
W_conv1 = weight_variable([3, 3, 3, 32])
b_conv1 = bias_variable([32])
h_conv1 = conv2d(x_image, W_conv1) + b_conv1
#h_conv1 = tf.layers.dropout(inputs=h_conv1, rate=droprate, training=istraining)
h_relu1 = tf.nn.relu(h_conv1)  # 卷积
h_pool1 = max_pool_2x2(h_relu1)  # 池化

h_pool4 = tf.reshape(h_pool1,[-1,16*16*32])
bn_layer5_flat = tf.layers.dense(inputs=h_pool4, units=10, name='linear')

cross_entropy = tf.losses.softmax_cross_entropy(onehot_labels=y_, logits=bn_layer5_flat,
                                                reduction=tf.losses.Reduction.MEAN)
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(bn_layer5_flat, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

sess.run(tf.global_variables_initializer())
x_train = np.array(train_data[b'data']) / 255
y_train = np.array(pd.get_dummies(train_data[b'labels']))
x_test = test_data[b'data'] / 255
y_test = np.array(pd.get_dummies(test_data[b'labels']))
eplr = 1e-4;
for i in range(20000):
    if i == 20000 * 0.5 or i == 20000 * 0.75:
        eplr = eplr / 10
    start = i * BATCH_SIZE % (50000 - BATCH_SIZE)
    sess.run(train_step,feed_dict={x: x_train[start: start + BATCH_SIZE],
                              y_: y_train[start: start + BATCH_SIZE],
                              })
    if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x: x_test[0: 200],
                                                  y_: y_test[0: 200]
                                                  })
        loss_value = cross_entropy.eval(feed_dict={x: x_train[start: start + BATCH_SIZE],
                                                   y_: y_train[start: start + BATCH_SIZE]
                                                   })
        print("step %d, trainning accuracy， %g loss %g" % (i, train_accuracy, loss_value))

test_accuracy = accuracy.eval(feed_dict={x: x_test, y_: y_test})
print("test accuracy %g" % test_accuracy)

tensorflow运行任何优化器，得到退出代码139，并被信号11中断：SIGSEGV

0 个答案: