我克隆了tensorflow 1.1.0,修改了./configure
文件,启用了mkl选项:
## Set up MKL related environment settings
if true; then # modify this to be true.
while [ "$TF_NEED_MKL" == "" ]; do
fromuser=""
read -p "Do you wish to build TensorFlow with MKL support? [y/N] " INPUT
fromuser="1"
case $INPUT in
[Yy]* ) echo "MKL support will be enabled for TensorFlow"; TF_NEED_MKL=1;;
[Nn]* ) echo "No MKL support will be enabled for TensorFlow"; TF_NEED_MKL=0;;
"" ) echo "No MKL support will be enabled for TensorFlow"; TF_NEED_MKL=0;;
* ) echo "Invalid selection: " $INPUT;;
esac
done
然后使用./configure命令:
$ ./configure
Please specify the location of python. [Default is /usr/local/bin/python]:
Do you wish to build TensorFlow with MKL support? [y/N]
No MKL support will be enabled for TensorFlow
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: ^C
root@4be8a8788f34:/tensor/tensorflow# ./configure
Please specify the location of python. [Default is /usr/local/bin/python]:
Do you wish to build TensorFlow with MKL support? [y/N] y
MKL support will be enabled for TensorFlow
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: Do you wish to use jemalloc as the malloc implementation? [Y/n]
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
No XLA support will be enabled for TensorFlow
Found possible Python library paths:
/usr/local/lib/python2.7/site-packages
/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/site-packages]
Using python library path: /usr/local/lib/python2.7/site-packages
Do you wish to build TensorFlow with OpenCL support? [y/N]
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N]
No CUDA support will be enabled for TensorFlow
Configuration finished
......................
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.................
INFO: All external dependencies fetched successfully.
我使用以下命令编译了tensorflow:
bazel build --config=opt --config=mkl --verbose_failures //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
然后使用以下命令进行安装:
pip install --target=$(pwd) /tmp/tensorflow_pkg/tensorflow-1.1.0-cp27-cp27mu-linux_x86_64.whl
我使用这些代码来测试tensorflow-build-cpu-mkl-windows注释中提到的tensorflow中启用的mkl:
from tensorflow.python.ops import nn_ops
import tensorflow as tf
import numpy as np
images = np.ones((1,1,15,1)).astype(np.float32)
filters = 1 * np.ones((1,1,1,1), np.float32)
with tf.Session(''):
output = nn_ops.conv2d(
images,
filters,
strides=[1,1,1,1],
padding='VALID',
data_format='NCHW',
).eval()
print output
代码成功运行:
[[[[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]
[ 1.]]]]
从代码中您可以看到使用nn_ops.conv2d没问题。但是使用slim.conv2d时会出现问题:
# demo from https://www.jianshu.com/p/a70c1d931395
import tensorflow as tf
import tensorflow.contrib.slim as slim
x1 = tf.ones(shape=[1, 64, 64, 3])
w = tf.fill([5, 5, 3, 64], 1)
# print("rank is", tf.rank(x1))
# x1 = tf.cast(x1, tf.float32)
w = tf.cast(w, tf.float32)
print('-----debugging-----')
print(type(x1))
print(x1.dtype.base_dtype)
print(type(w))
print(w.dtype.base_dtype)
print('-------------------')
# x1 = tf.cast(x1, tf.float16)
y1 = tf.nn.conv2d(x1, w, strides=[1, 1, 1, 1], padding='SAME')
y2 = slim.conv2d(x1, 64, [5, 5], weights_initializer=tf.ones_initializer, padding='SAME')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
y1_value,y2_value,x1_value=sess.run([y1,y2,x1])
print("shapes are", y1_value.shape, y2_value.shape)
print(y1_value==y2_value)
print(y1_value)
print(y2_value)
发生问题:
-----debugging-----
<class 'tensorflow.python.framework.ops.Tensor'>
<dtype: 'float32'>
<class 'tensorflow.python.framework.ops.Tensor'>
<dtype: 'float32'>
-------------------
Traceback (most recent call last):
File "/code/test_conv2d.py", line 26, in <module>
sess.run(tf.global_variables_initializer())
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'MklConv2DWithBias' with these attrs. Registered devices: [CPU], Registered kernels:
device='CPU'; label='MklLayer'; T in [DT_FLOAT]
[[Node: Conv/BiasAdd = MklConv2DWithBias[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](ones, DMT/_0, Conv/weights/read, DMT/_1, Conv/biases/read, DMT/_4)]]
Caused by op u'Conv/BiasAdd', defined at:
File "/code/test_conv2d.py", line 23, in <module>
y2 = slim.conv2d(x1, 64, [5, 5], weights_initializer=tf.ones_initializer, padding='SAME')
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/contrib/layers/python/layers/layers.py", line 924, in convolution
outputs = layer.apply(inputs)
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/layers/base.py", line 323, in apply
return self.__call__(inputs, **kwargs)
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/layers/base.py", line 292, in __call__
outputs = self.call(inputs, **kwargs)
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/layers/convolutional.py", line 176, in call
data_format=utils.convert_data_format(self.data_format, 4))
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/ops/nn_ops.py", line 1346, in bias_add
return gen_nn_ops._bias_add(value, bias, data_format=data_format, name=name)
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/ops/gen_nn_ops.py", line 282, in _bias_add
data_format=data_format, name=name)
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/framework/op_def_library.py", line 779, in apply_op
op_def=op_def)
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'MklConv2DWithBias' with these attrs. Registered devices: [CPU], Registered kernels:
device='CPU'; label='MklLayer'; T in [DT_FLOAT]
[[Node: Conv/BiasAdd = MklConv2DWithBias[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](ones, DMT/_0, Conv/weights/read, DMT/_1, Conv/biases/read, DMT/_4)]]
如果我不使用mkl优化的tensorflow,就不会出现这样的问题。
我看到MklConv2DWithBias已在tensorflow / core / ops / nn_ops.cc中注册:
REGISTER_OP("MklConv2DWithBias")
.Input("input: T")
.Input("mkl_input: uint8")
.Input("filter: T")
.Input("mkl_filter: uint8")
.Input("bias: T")
.Input("mkl_bias: uint8")
.Output("output: T")
.Output("mkl_output: uint8")
.Attr("T: {half, float, double}")
.Attr("strides: list(int)")
.Attr("use_cudnn_on_gpu: bool = true")
.Attr(GetPaddingAttrString())
.Attr(GetConvnetDataFormatAttrString());
我在这里no-opkernel-was-registered-to-support-op-conv2d-with-these-attrs看到输入和过滤器应使用float32,我确定输入是float32,但是如何在此处将过滤器设置为float32?
我试图转换参数类型,但是失败了。错误仍然存在: :
slim.conv2d(x1, np.float32(64.0), np.array([5.0, 5.0], dtype=np.float32), weights_initializer=tf.ones_initializer, padding='SAME')
我找到了重现此问题的方法。
1。下载tensorflow zip并解压缩。
https://drive.google.com/file/d/1k616LEvgTUXpHuX6Fz7EDXukaG5tGZwQ/view
2。将tensorflow文件夹安装到容器中
docker run --rm -it -v $(pwd):/code econtal/numpy-mkl bash
3。创建测试脚本:
$ cd /code
$ vi test_conv2d.py
# demo from https://www.jianshu.com/p/a70c1d931395
import tensorflow as tf
import tensorflow.contrib.slim as slim
import numpy as np
x1 = tf.ones(shape=[1, 64, 64, 3])
w = tf.fill([5, 5, 3, 64], 1)
# print("rank is", tf.rank(x1))
# x1 = tf.cast(x1, tf.float32)
w = tf.cast(w, tf.float32)
print('-----debugging-----')
print(type(x1))
print(x1.dtype.base_dtype)
print(type(w))
print(w.dtype.base_dtype)
print('-------------------')
# x1 = tf.cast(x1, tf.float16)
y1 = tf.nn.conv2d(x1, w, strides=[1, 1, 1, 1], padding='SAME')
y2 = slim.conv2d(x1, np.float32(64.0), np.array([5.0, 5.0], dtype=np.float32),
weights_initializer=tf.ones_initializer, padding='SAME')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
y1_value,y2_value,x1_value=sess.run([y1,y2,x1])
print("shapes are", y1_value.shape, y2_value.shape)
print(y1_value==y2_value)
print(y1_value)
print(y2_value)
4。运行测试脚本
$ python test_conv2d.py
任何人都可以就如何找到问题提供一些建议吗?谢谢。