Question

运行Keras模型...不好的是，不使用CPU扩展要快得多（反之亦然）。查看下面的输出。

是否有一个配置文件，可以在其中设置inter_op_parallelism选项？

static getDerivedStateFromProps

 Using TensorFlow backend.
 2018-10-18 17:21:32.620461: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
 2018-10-18 17:21:32.621535: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
 Results: -33.20 (23.69) MSE

 real   2m55.990s
 user   4m8.784s
 sys    3m50.192s

Answer 1

这是我在keras上使用的代码，只需将其放在代码顶部即可。

from keras import backend as K
import tensorflow as tf

NUM_PARALLEL_EXEC_UNITS = 6

config = tf.ConfigProto(intra_op_parallelism_threads = NUM_PARALLEL_EXEC_UNITS, 
         inter_op_parallelism_threads = 1, 
         allow_soft_placement = True, 
         device_count = {'CPU': NUM_PARALLEL_EXEC_UNITS })

session = tf.Session(config=config)

K.set_session(session)

import os

os.environ["OMP_NUM_THREADS"] = str(NUM_PARALLEL_EXEC_UNITS)
os.environ["KMP_BLOCKTIME"] = "30"
os.environ["KMP_SETTINGS"] = "1"
os.environ["KMP_AFFINITY"]= "granularity=fine,verbose,compact,1,0"

注意：我对结果有些失望。仅使用这些参数，我可以达到最高150％的速度。

为什么Keras模型在“裸机” CPU上速度更快？

1 个答案: