我正在尝试在C ++ OpenCV项目中运行Faster-RCNN对象检测器。
据我所知(https://github.com/opencv/opencv_contrib/issues/490),前一段时间DNN模块的计算时间存在一些问题。
有些人发现Gemm方法是瓶颈,解决方案是使用MKL或OpenBlas实现它。 (https://github.com/opencv/opencv_contrib/blob/86342522b0eb2b16fa851c020cc4e0fef4e010b7/modules/dnn/src/layers/op_blas.cpp#L100)
据说OpenCV的新版本已经解决了这个问题,计算时间与Caffe的实现几乎相同。
我已经在Ubuntu 16.04中编译了最新的GitHub版本的OpenCV for C ++,其中包含以下信息:
General configuration for OpenCV 4.0.0-pre =====================================
Version control: unknown
Extra modules:
Location (extra): /home/alex/Documents/opencv_contrib/modules
Version control (extra): unknown
Platform:
Timestamp: 2018-04-19T11:12:11Z
Host: Linux 4.13.0-38-generic x86_64
CMake: 3.5.1
CMake generator: Unix Makefiles
CMake build tool: /usr/bin/make
Configuration: Release
CPU/HW features:
Baseline: SSE SSE2 SSE3
requested: SSE3
Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (3 files): + SSSE3 SSE4_1
SSE4_2 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (5 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (9 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
AVX512_SKX (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_SKX
C/C++:
Built as dynamic libs?: YES
C++ Compiler: /usr/bin/c++ (ver 5.4.0)
C++ flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wsuggest-override -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
C++ flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wsuggest-override -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
C Compiler: /usr/bin/cc
C flags (Release): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG
C flags (Debug): -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release):
Linker flags (Debug):
ccache: NO
Precompiled headers: YES
Extra dependencies: dl m pthread rt /home/alex/Halide/build/lib/libHalide.so
3rdparty dependencies:
OpenCV modules:
To be built: aruco bgsegm bioinspired calib3d ccalib core datasets dnn dnn_objdetect dpm face features2d flann freetype fuzzy hfs highgui img_hash imgcodecs imgproc java_bindings_generator line_descriptor ml objdetect optflow phase_unwrapping photo plot python2 python_bindings_generator reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab xfeatures2d ximgproc xobjdetect xphoto
Disabled: js world
Disabled by dependency: -
Unavailable: cnn_3dobj cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev cvv dnn_modern hdf java matlab ovis python3 sfm viz
Applications: tests perf_tests examples apps
Documentation: NO
Non-free algorithms: NO
GUI:
GTK+: YES (ver 2.24.30)
GThread : YES (ver 2.48.2)
GtkGlExt: NO
VTK support: NO
Media I/O:
ZLib: /usr/lib/x86_64-linux-gnu/libz.so (ver 1.2.8)
JPEG: /usr/lib/x86_64-linux-gnu/libjpeg.so (ver )
WEBP: build (ver encoder: 0x020e)
PNG: /usr/lib/x86_64-linux-gnu/libpng.so (ver 1.2.54)
TIFF: /usr/lib/x86_64-linux-gnu/libtiff.so (ver 42 / 4.0.6)
JPEG 2000: /usr/lib/x86_64-linux-gnu/libjasper.so (ver 1.900.1)
OpenEXR: build (ver 1.7.1)
Video I/O:
DC1394: YES (ver 2.2.4)
FFMPEG: YES
avcodec: YES (ver 56.60.100)
avformat: YES (ver 56.40.101)
avutil: YES (ver 54.31.100)
swscale: YES (ver 3.1.101)
avresample: NO
GStreamer: NO
libv4l/libv4l2: NO
v4l/v4l2: linux/videodev2.h
gPhoto2: NO
Parallel framework: pthreads
Trace: YES (with Intel ITT)
Other third-party libraries:
Intel IPP: 2017.0.3 [2017.0.3]
at: /home/alex/Documents/opencv/build/3rdparty/ippicv/ippicv_lnx
Intel IPP IW: sources (2017.0.3)
at: /home/alex/Documents/opencv/build/3rdparty/ippicv/ippiw_lnx
Lapack: YES (/usr/lib/liblapack.so /usr/lib/libcblas.so /usr/lib/libatlas.so)
Halide: YES (/home/alex/Halide/build/lib/libHalide.so /home/alex/Halide/build/include)
Eigen: NO
Custom HAL: NO
Protobuf: build (3.5.1)
OpenCL: YES (no extra features)
Include path: /home/alex/Documents/opencv/3rdparty/include/opencl/1.2
Link libraries: Dynamic load
Python 2:
Interpreter: /usr/bin/python2.7 (ver 2.7.12)
Libraries: /usr/lib/x86_64-linux-gnu/libpython2.7.so (ver 2.7.12)
numpy: /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.11.0)
packages path: lib/python2.7/dist-packages
Python (for build): /usr/bin/python2.7
Java:
ant: NO
JNI: NO
Java wrappers: NO
Java tests: NO
Matlab: NO
Install to: /usr/local
-----------------------------------------------------------------
正如您可以检查我的OpenCV是否已经使用OpenBlas(Laplack)支持编译,但是现在我的问题,我不知道在我的代码中我是否必须指定我想用OpenBlas运行DNN net.forward方法或者默认情况下完成。
我的代码示例:
double kInpWidth = 800;
double kInpHeight = 600;
// Resize Image
Mat img = Camera.ActualFrame;
cv::resize(img, img, Size(kInpWidth, kInpHeight));
// Convert image to blob for the net
Mat blob = blobFromImage(img, 1.0, Size(), Scalar(102.9801, 115.9465, 122.7717), false, false);
Mat imInfo = (Mat_<float>(1, 3) << img.rows, img.cols, 1.6f);
// Pass data through the net
net.setInput(blob, "data");
net.setInput(imInfo, "im_info");
// Net Forward
Mat detections = net.forward();
const float* data = (float*)detections.data;
我的计算时间为25秒/帧,这与我在CPU模式下运行Faster-RCNN时的预期不同。
是否有人以相对较低的计算时间使用它?如何检查Gemm函数是否由OpenBlas计算?