Question

我克隆了tensorflow 1.1.0，修改了./configure文件，启用了mkl选项：

## Set up MKL related environment settings
if true; then # modify this to be true.
  while [ "$TF_NEED_MKL" == "" ]; do
    fromuser=""
    read -p "Do you wish to build TensorFlow with MKL support? [y/N] " INPUT
    fromuser="1"
    case $INPUT in
      [Yy]* ) echo "MKL support will be enabled for TensorFlow"; TF_NEED_MKL=1;;
      [Nn]* ) echo "No MKL support will be enabled for TensorFlow"; TF_NEED_MKL=0;;
      "" ) echo "No MKL support will be enabled for TensorFlow"; TF_NEED_MKL=0;;
      * ) echo "Invalid selection: " $INPUT;;
    esac
  done

然后使用./configure命令：

$ ./configure
Please specify the location of python. [Default is /usr/local/bin/python]:
Do you wish to build TensorFlow with MKL support? [y/N]
No MKL support will be enabled for TensorFlow
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: ^C
root@4be8a8788f34:/tensor/tensorflow# ./configure
Please specify the location of python. [Default is /usr/local/bin/python]:
Do you wish to build TensorFlow with MKL support? [y/N] y
MKL support will be enabled for TensorFlow

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: Do you wish to use jemalloc as the malloc implementation? [Y/n]
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
No XLA support will be enabled for TensorFlow
Found possible Python library paths:
  /usr/local/lib/python2.7/site-packages
  /tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/site-packages]

Using python library path: /usr/local/lib/python2.7/site-packages
Do you wish to build TensorFlow with OpenCL support? [y/N]
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N]
No CUDA support will be enabled for TensorFlow
Configuration finished
......................
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.................
INFO: All external dependencies fetched successfully.

我使用以下命令编译了tensorflow：

bazel build --config=opt --config=mkl --verbose_failures //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

然后使用以下命令进行安装：

pip install --target=$(pwd) /tmp/tensorflow_pkg/tensorflow-1.1.0-cp27-cp27mu-linux_x86_64.whl

我使用这些代码来测试tensorflow-build-cpu-mkl-windows注释中提到的tensorflow中启用的mkl：

from tensorflow.python.ops import nn_ops
import tensorflow as tf
import numpy as np

images = np.ones((1,1,15,1)).astype(np.float32)
filters = 1 * np.ones((1,1,1,1), np.float32)

with tf.Session(''):
    output = nn_ops.conv2d(
        images,
        filters,
        strides=[1,1,1,1],
        padding='VALID',
        data_format='NCHW',
    ).eval()
    print output

代码成功运行：

[[[[ 1.]
   [ 1.]
   [ 1.]
   [ 1.]
   [ 1.]
   [ 1.]
   [ 1.]
   [ 1.]
   [ 1.]
   [ 1.]
   [ 1.]
   [ 1.]
   [ 1.]
   [ 1.]
   [ 1.]]]]

从代码中您可以看到使用nn_ops.conv2d没问题。但是使用slim.conv2d时会出现问题：

# demo from https://www.jianshu.com/p/a70c1d931395
import tensorflow as tf 
import tensorflow.contrib.slim as slim

x1 = tf.ones(shape=[1, 64, 64, 3]) 
w = tf.fill([5, 5, 3, 64], 1)
# print("rank is", tf.rank(x1))

# x1 = tf.cast(x1, tf.float32)
w = tf.cast(w, tf.float32)

print('-----debugging-----')
print(type(x1))
print(x1.dtype.base_dtype)

print(type(w))
print(w.dtype.base_dtype)
print('-------------------')

# x1 = tf.cast(x1, tf.float16)

y1 = tf.nn.conv2d(x1, w, strides=[1, 1, 1, 1], padding='SAME')
y2 = slim.conv2d(x1, 64, [5, 5], weights_initializer=tf.ones_initializer, padding='SAME')

with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) 
    y1_value,y2_value,x1_value=sess.run([y1,y2,x1])
    print("shapes are", y1_value.shape, y2_value.shape)
    print(y1_value==y2_value)
    print(y1_value)
    print(y2_value)

发生问题：

-----debugging-----
<class 'tensorflow.python.framework.ops.Tensor'>
<dtype: 'float32'>
<class 'tensorflow.python.framework.ops.Tensor'>
<dtype: 'float32'>
-------------------

Traceback (most recent call last):
  File "/code/test_conv2d.py", line 26, in <module>
    sess.run(tf.global_variables_initializer())
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/client/session.py", line 778, in run
    run_metadata_ptr)
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/client/session.py", line 982, in _run
    feed_dict_string, options, run_metadata)
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/client/session.py", line 1032, in _do_run
    target_list, options, run_metadata)
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/client/session.py", line 1052, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'MklConv2DWithBias' with these attrs.  Registered devices: [CPU], Registered kernels:
  device='CPU'; label='MklLayer'; T in [DT_FLOAT]

     [[Node: Conv/BiasAdd = MklConv2DWithBias[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](ones, DMT/_0, Conv/weights/read, DMT/_1, Conv/biases/read, DMT/_4)]]

Caused by op u'Conv/BiasAdd', defined at:
  File "/code/test_conv2d.py", line 23, in <module>
    y2 = slim.conv2d(x1, 64, [5, 5], weights_initializer=tf.ones_initializer, padding='SAME')
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
    return func(*args, **current_args)
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/contrib/layers/python/layers/layers.py", line 924, in convolution
    outputs = layer.apply(inputs)
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/layers/base.py", line 323, in apply
    return self.__call__(inputs, **kwargs)
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/layers/base.py", line 292, in __call__
    outputs = self.call(inputs, **kwargs)
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/layers/convolutional.py", line 176, in call
    data_format=utils.convert_data_format(self.data_format, 4))
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/ops/nn_ops.py", line 1346, in bias_add
    return gen_nn_ops._bias_add(value, bias, data_format=data_format, name=name)
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/ops/gen_nn_ops.py", line 282, in _bias_add
    data_format=data_format, name=name)
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/framework/op_def_library.py", line 779, in apply_op
    op_def=op_def)
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/framework/ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/tensor/target/applib-runtime-tensorflow-mkl-with-cpu-without-numpy/tensorflow/python/framework/ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'MklConv2DWithBias' with these attrs.  Registered devices: [CPU], Registered kernels:
  device='CPU'; label='MklLayer'; T in [DT_FLOAT]

     [[Node: Conv/BiasAdd = MklConv2DWithBias[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true](ones, DMT/_0, Conv/weights/read, DMT/_1, Conv/biases/read, DMT/_4)]]

如果我不使用mkl优化的tensorflow，就不会出现这样的问题。

我看到MklConv2DWithBias已在tensorflow / core / ops / nn_ops.cc中注册：

REGISTER_OP("MklConv2DWithBias")
    .Input("input: T")
    .Input("mkl_input: uint8")
    .Input("filter: T")
    .Input("mkl_filter: uint8")
    .Input("bias: T")
    .Input("mkl_bias: uint8")
    .Output("output: T")
    .Output("mkl_output: uint8")
    .Attr("T: {half, float, double}")
    .Attr("strides: list(int)")
    .Attr("use_cudnn_on_gpu: bool = true")
    .Attr(GetPaddingAttrString())
    .Attr(GetConvnetDataFormatAttrString());

我在这里no-opkernel-was-registered-to-support-op-conv2d-with-these-attrs看到输入和过滤器应使用float32，我确定输入是float32，但是如何在此处将过滤器设置为float32？

我试图转换参数类型，但是失败了。错误仍然存在：：

slim.conv2d(x1, np.float32(64.0), np.array([5.0, 5.0], dtype=np.float32), weights_initializer=tf.ones_initializer, padding='SAME')

如何产生问题

我找到了重现此问题的方法。

1。下载tensorflow zip并解压缩。

https://drive.google.com/file/d/1k616LEvgTUXpHuX6Fz7EDXukaG5tGZwQ/view

2。将tensorflow文件夹安装到容器中

docker run --rm -it -v $(pwd):/code econtal/numpy-mkl  bash

3。创建测试脚本：

$ cd /code
$ vi test_conv2d.py

# demo from https://www.jianshu.com/p/a70c1d931395
import tensorflow as tf 
import tensorflow.contrib.slim as slim
import numpy as np

x1 = tf.ones(shape=[1, 64, 64, 3]) 
w = tf.fill([5, 5, 3, 64], 1)
# print("rank is", tf.rank(x1))

# x1 = tf.cast(x1, tf.float32)  
w = tf.cast(w, tf.float32)

print('-----debugging-----')
print(type(x1))
print(x1.dtype.base_dtype)

print(type(w))
print(w.dtype.base_dtype)
print('-------------------')

# x1 = tf.cast(x1, tf.float16)

y1 = tf.nn.conv2d(x1, w, strides=[1, 1, 1, 1], padding='SAME')
y2 = slim.conv2d(x1, np.float32(64.0), np.array([5.0, 5.0], dtype=np.float32), 
weights_initializer=tf.ones_initializer, padding='SAME')

with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) 
    y1_value,y2_value,x1_value=sess.run([y1,y2,x1])
    print("shapes are", y1_value.shape, y2_value.shape)
    print(y1_value==y2_value)
    print(y1_value)
    print(y2_value)

4。运行测试脚本 $ python test_conv2d.py

任何人都可以就如何找到问题提供一些建议吗？谢谢。

Answer 1

我也对张量流issue提出了同样的问题。建议使用1.8版本，该版本在测试后才有效。

InvalidArgumentError（请参阅上面的回溯）：没有使用这些attrs注册任何OpKernel来支持Op'MklConv2DWithBias'

如何产生问题

1 个答案: