Question

我有一个简单的测试脚本来说明我遇到的问题。我正在尝试使用张量流实现CNN，但是在更改输入大小时开始遇到分段错误。在测试脚本中，如果n_H = 3000，我可以成功运行它。但是当我设置n_H = 4000时出现了分割错误。另外，如果我通过设置with_conv = False来在不使用layars.conv2d的情况下运行它，则脚本将成功运行。有谁知道我的问题是什么？

我正在具有12个CPU的主机上运行此命令。我真的不太理解tensorflow关于“使用默认的互操作设置创建新的线程池：2”的消息。我不知道这是否与我的问题有关。

这是我遇到细分错误时的输出：

$ python test.py
(100, 4000, 100, 1) (100, 8)
2018-10-10 11:57:23.825704: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
2018-10-10 11:57:23.827653: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Fatal Python error: Segmentation fault

Thread 0x00007fea9af1a740 (most recent call first):
  File "/home/seng/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350 in _call_tf_sessionrun
  File "/home/seng/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263 in _run_fn
  File "/home/seng/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278 in _do_call
  File "/home/seng/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272 in _do_run
  File "/home/seng/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100 in _run
  File "/home/seng/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877 in run
  File "test.py", line 43 in <module>
Segmentation fault (core dumped)

这是测试脚本：

import faulthandler; faulthandler.enable()
import numpy as np
import tensorflow as tf

seed = 42
tf.reset_default_graph()
tf.set_random_seed(seed)
np.random.seed(seed)

num_examples = 100
n_H = 4000
with_conv = True

# generate training data
X_gen = np.random.randn(num_examples*n_H*100).reshape(num_examples, n_H, 100, 1)
Y_gen = np.random.randn(num_examples*8).reshape(num_examples, 8)    
X_train = X_gen[0:num_examples, 0:n_H, ...]
Y_train = Y_gen[0:num_examples, ...]
print(X_train.shape, Y_train.shape)

# create placeholders
X = tf.placeholder(tf.float32, shape=(None, n_H, 100, 1))
Y = tf.placeholder(tf.float32, shape=(None, 8))

# build graph
if (with_conv):
    conv1 = tf.layers.conv2d(X, filters=64, kernel_size=[5, 5],strides = 1, padding='valid',activation = tf.nn.relu)    
    pool1 = tf.layers.max_pooling2d(conv1, pool_size=[2, 2], strides=2, padding='valid')
else:
    pool1 = tf.layers.max_pooling2d(X, pool_size=[2, 2], strides=2, padding='valid')    
pool1_flat = tf.layers.flatten(pool1)
dense2 = tf.layers.dense(pool1_flat, units=256, activation=tf.nn.relu)
H = tf.layers.dense(dense2, units=8, activation=tf.nn.relu)

# compute cost
cost = tf.reduce_mean(tf.square(Y - H))

# initialize variables
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    a = sess.run(cost, feed_dict = {X: X_train, Y:Y_train})
    print(a)

Answer 1

通常，分段错误意味着您的主机没有足够的RAM来运行脚本。当您更改n_H值时，即使不向卷积层添加任何参数，也可以向density2层添加许多参数。另外，您会在conv1层上添加很多操作，因为输入量要大得多。

conv1层中包含的ops和1600参数完全有可能使RAM饱和并使脚本无法运行。尝试使用“ htop”或任何其他跟踪器跟踪您的RAM使用情况。

分段故障运行layers.conv2d

1 个答案: