在Linux上将CUDA代码编译为静态库(.a)

时间:2019-03-05 15:41:03

标签: c cuda static-linking

我正在尝试编译一个包含CUDA代码的小型库。

我已经成功地将其编译为共享库,但是我真正需要的是静态库。

我有两个源文件:

  • main.c :包含用C编写的测试函数。我使用gcc编译该文件

  • main_kernel.cu :包含一个CUDA内核“ testKernel”和一个C封装函数“ test_gpu”,该函数调用testKernel。

以下是 main_kernel.cu 的摘录:

__global__ void testKernel(float *data, const int l)
{
    int idx = blockIdx.x*blockDim.x+threadIdx.x;
    if (idx < l)
        data[idx]++;
}

#ifdef __cplusplus
extern "C" {
#endif
void test_gpu(float *data, const int length)
{
    // Run kernel
    testKernel<<< 512, 1024 >>>(data, length);
}
#ifdef __cplusplus
}
#endif

我使用gcc将 main.c 编译为 main.o <-可以按需运行。

我使用-rdc = true选项使用nvcc将 main_kernel.cu 编译为一个我称为 main_kernel_h.o 的中间对象。

然后我将nvcc与-dlink选项一起使用,以将中间对象设备链接到 main_kernel.o

然后,根据this answer,我使用rcs标志将这三个都链接到带有ar的静态库中。

这一切运行良好,但是当我要将可执行文件链接到新库时会出现问题。

然后,我获得了一堆CUDA函数的未定义引用。这是确切的错误:

../../build/test/bin/libLib.a(main_kernel_h.o): In function `__nv_cudaEntityRegisterCallback(void**)':
tmpxft_0000422a_00000000-5_main_kernel.compute_52.cudafe1.cpp:(.text+0x60): undefined reference to `__cudaRegisterFunction'
../../build/test/bin/libLib.a(main_kernel_h.o): In function `__device_stub__Z10testKernelPfi(float*, int)':
tmpxft_0000422a_00000000-5_main_kernel.compute_52.cudafe1.cpp:(.text+0x8a): undefined reference to `cudaSetupArgument'
tmpxft_0000422a_00000000-5_main_kernel.compute_52.cudafe1.cpp:(.text+0xb0): undefined reference to `cudaSetupArgument'
tmpxft_0000422a_00000000-5_main_kernel.compute_52.cudafe1.cpp:(.text+0xc7): undefined reference to `cudaLaunch'
../../build/test/bin/libLib.a(main_kernel_h.o): In function `test_gpu':
tmpxft_0000422a_00000000-5_main_kernel.compute_52.cudafe1.cpp:(.text+0x124): undefined reference to `cudaConfigureCall'
../../build/test/bin/libLib.a(main_kernel.o): In function `__cudaUnregisterBinaryUtil':
link.stub:(.text+0xf): undefined reference to `__cudaUnregisterFatBinary'
../../build/test/bin/libLib.a(main_kernel.o): In function `__cudaRegisterLinkedBinary(__fatBinC_Wrapper_t const*, void (*)(void**), void*)':
link.stub:(.text+0xd0): undefined reference to `__cudaRegisterFatBinary'

我从 nm 获得的输出是这样的:

main.o:
                 U _GLOBAL_OFFSET_TABLE_
0000000000000000 r .LC0
0000000000000000 T testFunc
                 U test_gpu

main_kernel.o:
                 U atexit
                 U __cudaRegisterFatBinary
0000000000000015 T __cudaRegisterLinkedBinary_57_tmpxft_0000422a_00000000_9_main_kernel_compute_52_cpp1_ii_335679f8
0000000000000000 t __cudaUnregisterBinaryUtil
                 U __cudaUnregisterFatBinary
0000000000000000 r fatbinData
                 U __fatbinwrap_57_tmpxft_0000422a_00000000_9_main_kernel_compute_52_cpp1_ii_335679f8
0000000000000000 r _ZL15__fatDeviceText
0000000000000000 b _ZL20__cudaFatCubinHandle
0000000000000010 b _ZL22__cudaPrelinkedFatbins
000000000000005b t _ZL26__cudaRegisterLinkedBinaryPK19__fatBinC_Wrapper_tPFvPPvES2_
0000000000000000 r _ZL87def_module_id_str_57_tmpxft_0000422a_00000000_9_main_kernel_compute_52_cpp1_ii_335679f8
0000000000000020 b _ZZ96__cudaRegisterLinkedBinary_57_tmpxft_0000422a_00000000_9_main_kernel_compute_52_cpp1_ii_335679f8E3__p
0000000000000030 b _ZZL26__cudaRegisterLinkedBinaryPK19__fatBinC_Wrapper_tPFvPPvES2_E16__callback_array
0000000000000028 b _ZZL26__cudaRegisterLinkedBinaryPK19__fatBinC_Wrapper_tPFvPPvES2_E3__i

main_kernel_h.o:
                 U cudaConfigureCall
                 U cudaLaunch
                 U __cudaRegisterFunction
                 U __cudaRegisterLinkedBinary_57_tmpxft_0000422a_00000000_9_main_kernel_compute_52_cpp1_ii_335679f8
                 U cudaSetupArgument
0000000000000000 r fatbinData
0000000000000000 D __fatbinwrap_57_tmpxft_0000422a_00000000_9_main_kernel_compute_52_cpp1_ii_335679f8
                 U _GLOBAL_OFFSET_TABLE_
0000000000000000 r .LC0
00000000000000e0 T test_gpu
00000000000000d0 T _Z10testKernelPfi
0000000000000070 T _Z31__device_stub__Z10testKernelPfiPfi
0000000000000000 r _ZL15__module_id_str
0000000000000000 t _ZL22____nv_dummy_param_refPv
0000000000000000 t _ZL24__sti____cudaRegisterAllv
0000000000000010 t _ZL31__nv_cudaEntityRegisterCallbackPPv
0000000000000030 b _ZL32__nv_fatbinhandle_for_managed_rt
0000000000000020 b _ZZ31__device_stub__Z10testKernelPfiPfiE3__f
0000000000000010 b _ZZL22____nv_dummy_param_refPvE5__ref
0000000000000000 b _ZZL31__nv_cudaEntityRegisterCallbackPPvE5__ref

如果您想要我的确切命令,我还包括了用于构建对象的makefile;

该库的Makefile:

ARCH = -gencode arch=compute_30,code=sm_30 \
       -gencode arch=compute_35,code=sm_35 \
       -gencode arch=compute_50,code=[sm_50,compute_50] \
       -gencode arch=compute_52,code=[sm_52,compute_52]

VPATH=.
SLIB=libLib.so
ALIB=libLib.a
OBJDIR=../../build/test/bin-int/lib/
OUTDIR=../../build/test/bin/

# Base C-stuff
CC=gcc
CPP=g++
NVCC=nvcc
AR=ar
ARFLAGS=rcs
OPTS=-Ofast
LDFLAGS= -lm -pthread -lc
COMMON= -DEXT_SO
CFLAGS=-Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors

# OPTS=-O0 -g # <- Debug

CFLAGS+=$(OPTS)

# CUDA
COMMON+= -I/usr/local/cuda/include/
LDFLAGS+= -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -lcurand

# CUDNN
LDFLAGS+= -lcudnn

# C-objects
OBJ=main.o

# CUDA-objects
# LDFLAGS+= -lstdc++ # <- Unsure if this is required
OBJ_CUDA=main_kernel.o
CUDA_HOST=main_kernel_h.o

OBJS = $(addprefix $(OBJDIR), $(OBJ))
OBJS_CUDA = $(addprefix $(OBJDIR), $(OBJ_CUDA))
OBJS_HOST = $(addprefix $(OBJDIR), $(CUDA_HOST))
DEPS = $(wildcard ./*.h) Makefile

# Build all steps
all: obj $(OBJS) $(OBJS_HOST) $(OBJS_CUDA) $(ALIB)

# Link static lib
$(ALIB): $(OBJS_CUDA) $(OBJS_HOST) $(OBJS)
    $(AR) $(ARFLAGS) $(OUTDIR)$@ $^

# Compile c
$(OBJDIR)%.o: %.c $(DEPS)
    $(CC) $(COMMON) $(CFLAGS) -c $< -o $@

# Compile cuda-hostcode
$(OBJDIR)%_h.o: %.cu $(DEPS)
    $(NVCC) $(ARCH) -c -rdc=true --compiler-options "$(CFLAGS)" $< -o $@

# Device Link device code   
$(OBJDIR)%.o: $(OBJDIR)%_h.o $(DEPS)
    $(NVCC) $(ARCH) -dlink -o $@ $< -lcuda -lcudart -lcublas -lcurand -lcudnn

obj:
    mkdir -p $(OBJDIR)

.PHONY: clean

clean:
    rm -rf $(OBJS) $(ALIB) $(OBJDIR)/*

对于尝试链接静态库的可执行文件:

VPATH=.
EXEC=Test
OBJDIR=../../build/test/bin-int/test/
OUTDIR=../../build/test/bin/
LIB=$(OUTDIR)libLib.a

# Base C-stuff
CC=gcc
CPP=g++
OPTS=-Ofast
LDFLAGS= -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -lcurand -lcudnn -Wl,-rpath,'$$ORIGIN' -s
CFLAGS= -MMD -MP -DNDEBUG -DSTRIP_PYTHON -I../src -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors

# OPTS=-O0 -g # <- Debug

CFLAGS+=$(OPTS)

# C-objects
OBJ=test.o

OBJS = $(addprefix $(OBJDIR), $(OBJ))
DEPS = $(wildcard ./*.h) Makefile

# Build all steps
all: obj $(EXEC)

# Link executable
$(EXEC): $(OBJS)
    $(CC) $^ -o $(OUTDIR)$@ $(LDFLAGS) $(LIB)

# Compile c
$(OBJDIR)%.o: %.c $(DEPS)
    $(CC) $(CFLAGS) -c $< -o $@

obj:
    mkdir -p $(OBJDIR)

.PHONY: clean

clean:
    rm -rf $(OBJS) $(OBJDIR)/*

希望您能帮助我找到我的错误。

1 个答案:

答案 0 :(得分:0)

结果发现问题不在于静态库的编译,而在于所述库的链接。

通过更改为我解决了该问题:

# Link executable
$(EXEC): $(OBJS)
    $(CC) $^ -o $(OUTDIR)$@ $(LDFLAGS) $(LIB)

进入:

# Link executable
$(EXEC): $(OBJS)
    $(CC) $^ -o $(OUTDIR)$@ $(LIB) $(LDFLAGS)

在链接到$(LDFLAGS)中的CUDA库之前,静态lib(只是对象的集合)已与其他对象链接。

也为将来遇到此问题的任何人提供注释;不管这是否实际上导致错误,这似乎都取决于编译器的版本。