Question

每次我尝试使用卷积神经网络训练模型时，也会遇到以下错误。我有具有CUDA可计算性6.1的GEForce GTX1050 4GB显卡。

错误：tensorflow.python.framework.errors_impl.InternalError：CUB减少错误无效的配置参数 [[{{node FUllYConnected_1 / weight_regularizer / Sum}}]] [[{{nodemetrics / acc / Mean}}]

到目前为止，我已经尝试过以下设置

from keras import backend as K
#import tensorflow as tf
with K.tf.device('/gpu:0'):
    config = tf.ConfigProto(intra_op_parallelism_threads=4,\
           #inter_op_parallelism_threads=4, 
           allow_soft_placement=True,\
           device_count = {'CPU' : 4, 'GPU' : 1}
           )
    session = tf.Session(config=config)
    K.set_session(session)'''
config = tf.ConfigProto(intra_op_parallelism_threads=1, allow_soft_placement=True, device_count = {'GPU': 1 , 'CPU': 1} ) 
sess = tf.Session(config=config) 
K.set_session(sess)

整个日志在下面

2019-04-24 20:06:26.915635: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-04-24 20:06:27.820975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 4.00GiB freeMemory: 3.30GiB
2019-04-24 20:06:27.828955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-24 20:06:28.433254: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-24 20:06:28.438610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-04-24 20:06:28.441346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-04-24 20:06:28.444947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3011 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
Found 670341 images belonging to 36 classes.
Found 13700 images belonging to 36 classes.
WARNING:tensorflow:From D:\AnacondaInstalled\envs\tfgpu\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-04-24 20:07:00.696348: E tensorflow/core/grappler/clusters/utils.cc:83] Failed to get device properties, error code: 30
Failed to initialize GPU device #0: unknown error
WARNING:tensorflow:From D:\AnacondaInstalled\envs\tfgpu\lib\site-packages\keras\backend\tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
1stInput (InputLayer) (None, 32, 32, 1) 0
_________________________________________________________________
1st (Conv2D) (None, 32, 32, 32) 832
_________________________________________________________________
batchNorm1 (BatchNormalizati (None, 32, 32, 32) 128
_________________________________________________________________
Relu1 (ReLU) (None, 32, 32, 32) 0
_________________________________________________________________
pool3 (MaxPooling2D) (None, 32, 32, 32) 0
_________________________________________________________________
dropout1 (Dropout) (None, 32, 32, 32) 0
_________________________________________________________________
2nd_conv (Conv2D) (None, 32, 32, 64) 18496
_________________________________________________________________
batchNorm2 (BatchNormalizati (None, 32, 32, 64) 256
_________________________________________________________________
relu2 (ReLU) (None, 32, 32, 64) 0
_________________________________________________________________
pool4 (MaxPooling2D) (None, 32, 32, 64) 0
_________________________________________________________________
dropout2 (Dropout) (None, 32, 32, 64) 0
_________________________________________________________________
3rd_conv (Conv2D) (None, 32, 32, 86) 49622
_________________________________________________________________
batchNorm3 (BatchNormalizati (None, 32, 32, 86) 344
_________________________________________________________________
relu3 (ReLU) (None, 32, 32, 86) 0
_________________________________________________________________
pool5 (MaxPooling2D) (None, 16, 16, 86) 0
_________________________________________________________________
dropout3 (Dropout) (None, 16, 16, 86) 0
_________________________________________________________________
FlattenLayer1 (Flatten) (None, 22016) 0
_________________________________________________________________
FUllYConnected_1 (Dense) (None, 1024) 22545408
_________________________________________________________________
batchNorm4 (BatchNormalizati (None, 1024) 4096
_________________________________________________________________
relu4 (ReLU) (None, 1024) 0
_________________________________________________________________
dropout6 (Dropout) (None, 1024) 0
_________________________________________________________________
FUll_2 (Dense) (None, 1024) 1049600
_________________________________________________________________
batchNorm5 (BatchNormalizati (None, 1024) 4096
_________________________________________________________________
relu5 (ReLU) (None, 1024) 0
_________________________________________________________________
dropout7 (Dropout) (None, 1024) 0
_________________________________________________________________
FUll_3 (Dense) (None, 36) 36900
=================================================================
Total params: 23,709,778
Trainable params: 23,705,318
Non-trainable params: 4,460
_________________________________________________________________
None
Model configured..!
WARNING:tensorflow:From D:\AnacondaInstalled\envs\tfgpu\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/30
2019-04-24 20:07:12.231219: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally
Traceback (most recent call last):
File "CNN_mainScript.py", line 74, in <module>
obj.training()
File "D:\Projects\OCR-master\LSTMTraining.py", line 234, in training
workers=1
File "D:\AnacondaInstalled\envs\tfgpu\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "D:\AnacondaInstalled\envs\tfgpu\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "D:\AnacondaInstalled\envs\tfgpu\lib\site-packages\keras\engine\training_generator.py", line 217, in fit_generator
class_weight=class_weight)
File "D:\AnacondaInstalled\envs\tfgpu\lib\site-packages\keras\engine\training.py", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "D:\AnacondaInstalled\envs\tfgpu\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "D:\AnacondaInstalled\envs\tfgpu\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "D:\AnacondaInstalled\envs\tfgpu\lib\site-packages\tensorflow\python\client\session.py", line 1439, in __call__
run_metadata_ptr)
File "D:\AnacondaInstalled\envs\tfgpu\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: CUB reduce errorinvalid configuration argument
[[{{node FUllYConnected_1/weight_regularizer/Sum}}]]
[[{{node metrics/acc/Mean}}]]

任何人都可以帮助我解决此问题。

Answer 1

升级软件包后，它才得到解决。我认为这与Cuda框架有关。将Tensorflow及其相关软件包升级到新版本后，问题得到解决。

CUB减少错误无效的配置参数[[{{node FUllYConnected_1 / weight_regularizer / Sum}}]] [[{{nodemetrics / acc / Mean}}]]

1 个答案: