这是sgemm程序的一个例子
#include <mkl.h>
#include <iostream>
#include <cstdlib>
#define ITERATION 1
int main()
{
int ra = 128;
int lda = 75;
int ldb = 55;
float* left = (float*)calloc(ra * lda, sizeof(float));
float* right = (float*)calloc(ldb * lda, sizeof(float));
float* ans = (float*)calloc(ra * ldb, sizeof(float));
std::cout << "left " << std::endl;
for (int i = 0; i < ra; ++i) {
for (int j = 0; j < lda; ++j) {
left[i * lda + j] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
std::cout << left[i * lda + j] << " ";
}
std::cout << std::endl;
}
std::cout << "right " << std::endl;
for (int i = 0; i < lda; ++i) {
for (int j = 0; j < ldb; ++j) {
right[i * ldb + j] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
std::cout << right[i * ldb + j] << " ";
}
std::cout << std::endl;
}
for (int i = 0; i < ITERATION; ++i) {
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, ra, ldb, lda, 1.0f, left, lda,
right, ldb, 0.0f, ans, ldb);
}
std::cout << "ans " << std::endl;
for (int i = 0; i < ra; ++i) {
for (int j = 0; j < ldb; ++j) {
std::cout << ans[i * ldb + j] << " ";
}
std::cout << std::endl;
}
return 0;
}
我使用g ++按选项-fopenmp -lmkl_rt
编译此程序,其中OMP_NUM_THREADS
已设置为16.
运行程序后,我发现与matlab结果相比,答案是完全错误的。如果准确度误差很少,我就不会说错。此外,我观察到程序在这些条件下表现良好:
因此,我猜问题可能在于-fopenmp
标志。你能帮我解决一下这个问题吗?谢谢!
g ++(GCC)4.4.7 20120313(Red Hat 4.4.7-16)
icc(ICC)16.0.3 20160415
Linux核心2.6.32-279.el6.x86_64
答案 0 :(得分:0)
根据MKL link line advisor,您无需将-fopenmp
与单个动态库-lmkl_rt
一起使用即可启用多线程。由于您的gcc
已久,这可能是个问题。
您可以尝试使用传统动态链接并比较以下设置,以查看问题所在。
线程MKL + GNU OpenMP
Link options: -Wl,--no-as-needed -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_gnu_thread -lpthread -lm -ldl
Compile options: -fopenmp -m64 -I${MKLROOT}/include
线程MKL + Intel OpenMP
Link options: -Wl,--no-as-needed -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -liomp5 -lpthread -lm -ldl
Compile options: -m64 -I${MKLROOT}/include