Question

我的目标是实时运行TensorFlow模型，从学习模型中控制车辆。我们的车辆系统使用与OpenCV紧密相连的ROS（机器人操作系统）。所以，我收到一张含有ROS感兴趣图像的OpenCV Mat。

    cv::Mat cameraImg;

我想直接从这个OpenCV矩阵中的数据创建一个Tensorflow Tensor，以避免逐行复制矩阵的费用。使用This Question的答案我设法使用以下代码获得网络的正向传递：

cameraImg.convertTo(cameraImg, CV_32FC3);

Tensor inputImg(DT_FLOAT, TensorShape({1,inputheight,inputwidth,3}));
auto inputImageMapped = inputImg.tensor<float, 4>();
auto start = std::chrono::system_clock::now();
//Copy all the data over
for (int y = 0; y < inputheight; ++y) {
    const float* source_row = ((float*)cameraImg.data) + (y * inputwidth * 3);
    for (int x = 0; x < inputwidth; ++x) {
        const float* source_pixel = source_row + (x * 3);
        inputImageMapped(0, y, x, 0) = source_pixel[2];
        inputImageMapped(0, y, x, 1) = source_pixel[1];
        inputImageMapped(0, y, x, 2) = source_pixel[0];
    }
}
auto end = std::chrono::system_clock::now();

然而，使用这种方法，复制到张量的时间在80ms到130ms之间，而整个前向传递（对于10层卷积网络）只需要25ms。

看the tensorflow documentation，看来有一个Tensor构造函数需要一个分配器。但是，我无法找到与此功能相关的任何Tensorflow或Eigen文档，或者与Tensors相关的Eigen Map class。

有没有人能够深入了解如何加快此代码的使用，理想情况是重新使用我的OpenCV内存？

修改我已经成功实现了@mrry建议的内容，并且可以重用OpenCV分配的内存。我已经打开github issue 8033，请求将其添加到tensorflow源树中。我的方法并不漂亮，但它确实有用。

编译外部库并将其链接到libtensorflow.so库仍然非常困难。 tensorflow cmake library可能有助于此，我还没有尝试过。

Answer 1

我知道它是旧线程但使用现有的C ++ API存在零问题解决方案：我用我的解决方案更新了你的github问题。 tensorflow/issues/8033

为了记录，我在这里复制我的解决方案：

// allocate a Tensor
Tensor inputImg(DT_FLOAT, TensorShape({1,inputHeight,inputWidth,3}));

// get pointer to memory for that Tensor
float *p = inputImg.flat<float>().data();
// create a "fake" cv::Mat from it 
cv::Mat cameraImg(inputHeight, inputWidth, CV_32FC3, p);

// use it here as a destination
cv::Mat imagePixels = ...; // get data from your video pipeline
imagePixels.convertTo(cameraImg, CV_32FC3);

希望这有帮助

Answer 2

TensorFlow C API（与C ++ API相对）导出TF_NewTensor()函数，它允许您从指针和长度创建张量，并且可以将结果对象传递给{{3功能。

目前，这是从预先分配的缓冲区创建TensorFlow张量的唯一公共API。没有受支持的方式将TF_Tensor*转换为tensorflow::Tensor，但如果您查看实现，则会有一个具有friend访问权限的私有API可以执行此操作。如果您尝试使用此功能，并且可以显示出明显的加速，我们会考虑将TF_Run()添加到公共API中。

将OpenCV Mat导入C ++ Tensorflow而不复制

2 个答案: