tensorflow运行任何优化器,得到退出代码139,并被信号11中断:SIGSEGV

时间:2018-11-18 13:32:07

标签: tensorflow sigsegv

我使用docker在rtx2080上运行了coed。 一旦我调用sess.run(train_step,feed_dick = {}),我就会得到“进程完成,退出代码139(信号11:SIGSEGV打断)。”但是,如果我使用cpu运行它,则效果很好。 我不知道发生了什么事。

使用TensorFlow后端。

  

2018-11-18 13:19:12.025412:我   tensorflow / core / platform / cpu_feature_guard.cc:141]您的CPU支持   TensorFlow二进制文件未编译使用的指令:AVX2   FMA 2018-11-18 13:19:12.132999:我   tensorflow / stream_executor / cuda / cuda_gpu_executor.cc:964]成功   从SysFS读取的NUMA节点的值为负(-1),但必须存在   至少一个NUMA节点,因此返回NUMA节点为零2018-11-18   13:19:12.133566:我   tensorflow / core / common_runtime / gpu / gpu_device.cc:1432]找到设备0   具有属性:名称:GeForce RTX 2080主要:7次要:5   memoryClockRate(GHz):1.8 pciBusID:0000:06:00.0 totalMemory:7.76GiB   freeMemory:7.46GiB 2018-11-18 13:19:12.133584:我   tensorflow / core / common_runtime / gpu / gpu_device.cc:1511]添加可见   gpu设备:0 2018-11-18 13:19:12.394726:I   tensorflow / core / common_runtime / gpu / gpu_device.cc:982]设备   将StreamExecutor与强度1边缘矩阵互连:2018-11-18   13:19:12.394763:我   tensorflow / core / common_runtime / gpu / gpu_device.cc:988] 0   2018-11-18 13:19:12.394770:我   tensorflow / core / common_runtime / gpu / gpu_device.cc:1001] 0:N   2018-11-18 13:19:12.394963:我   tensorflow / core / common_runtime / gpu / gpu_device.cc:1115]已创建   TensorFlow设备(/ job:localhost /副本:0 /任务:0 /设备:GPU:0与   7172 MB内存)->物理GPU(设备:0,名称:GeForce RTX 2080,   pci总线ID:0000:06:00.0,计算能力:7.5)

import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow.contrib.framework import arg_scope
from keras.layers import Dense, Activation
import pickle
from tensorflow.contrib.layers import batch_norm, flatten

train_data = {b'data': [], b'labels': []}
# 加载训练数据
for i in range(5):
    with open("data/cifar-10/data_batch_" + str(i + 1), mode='rb') as file:
        data = pickle.load(file, encoding='bytes')
        train_data[b'data'] += list(data[b'data'])
        train_data[b'labels'] += data[b'labels']
# 加载测试数据
with open("data/cifar-10/test_batch", mode='rb') as file:
    test_data = pickle.load(file, encoding='bytes')
# 定义一些变量
NUM_LABLES = 10  # 分类结果为10类
BATCH_SIZE = 64  # 每次训练batch数

sess = tf.InteractiveSession()


# 权重初始化
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=2 / shape[0] / shape[1] / shape[2])
    #   initial = tf.truncated_normal(shape, stddev=0.01)
    return tf.Variable(initial)


# 卷积层偏置初始化为常数0.1
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


# 定义卷积操作,卷积步长为1. padding = 'SAME' 表示全0填充
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


# 定义最大池化操作,尺寸为2,步长为2,全0填充
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# 对输入进行占位操作,输入为BATCH*3072向量,输出为BATCH*10向量
x = tf.placeholder(tf.float32, [None, 3072])
y_ = tf.placeholder(tf.float32, [None, NUM_LABLES])
# 对输入进行reshape,转换成3*32*32格式
x_image = tf.reshape(x, [-1, 3, 32, 32])
# 转置操作,转换成滤波器做卷积所需格式:32*32*3,32*32为其二维卷积操作维度
x_image = tf.transpose(x_image, [0, 2, 3, 1])

# 第一层卷积,滤波器参数3*3*3, 32个
#bn_layer1 = Batch_Normalization(x_image, istraining, "bn1")
W_conv1 = weight_variable([3, 3, 3, 32])
b_conv1 = bias_variable([32])
h_conv1 = conv2d(x_image, W_conv1) + b_conv1
#h_conv1 = tf.layers.dropout(inputs=h_conv1, rate=droprate, training=istraining)
h_relu1 = tf.nn.relu(h_conv1)  # 卷积
h_pool1 = max_pool_2x2(h_relu1)  # 池化

h_pool4 = tf.reshape(h_pool1,[-1,16*16*32])
bn_layer5_flat = tf.layers.dense(inputs=h_pool4, units=10, name='linear')

cross_entropy = tf.losses.softmax_cross_entropy(onehot_labels=y_, logits=bn_layer5_flat,
                                                reduction=tf.losses.Reduction.MEAN)
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(bn_layer5_flat, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

sess.run(tf.global_variables_initializer())
x_train = np.array(train_data[b'data']) / 255
y_train = np.array(pd.get_dummies(train_data[b'labels']))
x_test = test_data[b'data'] / 255
y_test = np.array(pd.get_dummies(test_data[b'labels']))
eplr = 1e-4;
for i in range(20000):
    if i == 20000 * 0.5 or i == 20000 * 0.75:
        eplr = eplr / 10
    start = i * BATCH_SIZE % (50000 - BATCH_SIZE)
    sess.run(train_step,feed_dict={x: x_train[start: start + BATCH_SIZE],
                              y_: y_train[start: start + BATCH_SIZE],
                              })
    if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x: x_test[0: 200],
                                                  y_: y_test[0: 200]
                                                  })
        loss_value = cross_entropy.eval(feed_dict={x: x_train[start: start + BATCH_SIZE],
                                                   y_: y_train[start: start + BATCH_SIZE]
                                                   })
        print("step %d, trainning accuracy, %g loss %g" % (i, train_accuracy, loss_value))

test_accuracy = accuracy.eval(feed_dict={x: x_test, y_: y_test})
print("test accuracy %g" % test_accuracy)

0 个答案:

没有答案