为什么这对动态库函数的调用如此之慢?

时间:2015-08-28 14:08:24

标签: python c shared-libraries ctypes

我正在为python编写一个共享库来调用。由于这是我第一次使用python的ctypes模块,几乎是我第一次编写共享库,我一直在编写C和python代码来调用库的函数。

对于它,我把一些定时代码放入并发现,虽然C程序对库的大多数调用非常快,但第一个是慢的,实际上比它的python速度慢得多。这违背了我的预期,并希望有人能告诉我原因。

这是我的C库中的头文件的精简版本。

typedef struct MdaDataStruct
{
    int numPts;
    int numDists;
    float* data;
    float* dists;
} MdaData;

//allocate the structure
void* makeMdaStruct(int numPts, int numDist);

//deallocate the structure
void freeMdaStruct(void* strPtr);

//assign the data array
void setData(void* strPtr, float* divData);

以下是调用函数的C程序:

int main(int argc, char* argv[])
{
    clock_t t1, t2;
    t1=clock();
    long long int diff;
    //test the allocate function
    t1 = clock();
    MdaData* dataPtr = makeMdaStruct(10, 3);
    t2 = clock();
    diff = (((t2-t1)*1000000)/CLOCKS_PER_SEC);
    printf("make struct, took: %d microseconds\n", diff);

    //make some data
    float testArr[10] = {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9};

    //test the set data function
    t1 = clock();
    setData(dataPtr, testArr);
    t2 = clock();
    diff = (((t2-t1)*1000000)/CLOCKS_PER_SEC);
    printf("set data, took: %d microseconds\n", diff);

    //test the deallocate function
    t1 = clock();
    freeMdaStruct(dataPtr);
    t2 = clock();
    diff = (((t2-t1)*1000000)/CLOCKS_PER_SEC);
    printf("free struct, took: %d microseconds\n", diff);

    //exit
    return 0;
}

这是调用函数的python脚本:

# load the library
t1 = time.time()
cs_lib = cdll.LoadLibrary("./libChiSq.so")
t2 = time.time()
print "load library, took", int((t2-t1)*1000000), "microseconds"
# tell python the function will return a void pointer
cs_lib.makeMdaStruct.restype = c_void_p
# make the strcuture to hold the MdaData with 50 data points and 8 L dists
t1 = time.time()
mdaData = cs_lib.makeMdaStruct(10,3)
t2 = time.time()
print "make struct, took", int((t2-t1)*1000000), "microseconds"
# make an array with the test data
divDat = np.array([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], np.float32)
#run the function to load the array into the struct
t1 = time.time()
cs_lib.setData(mdaData, divDat.ctypes.data)
t2 = time.time()
print "set data, took", int((t2-t1)*1000000), "microseconds"
#free the structure
t1 = time.time()
cs_lib.freeMdaStruct(mdaData)
t2 = time.time()
print "free struct, took", int((t2-t1)*1000000), "microseconds"

最后,这是连续运行两个的输出:

[]$ ./tester
make struct, took: 60 microseconds
set data, took: 2 microseconds
free struct, took: 2 microseconds
[]$ python so_py_tester.py 
load library, took 77 microseconds
make struct, took 3 microseconds
set data, took 23 microseconds
free struct, took 10 microseconds

正如您所看到的,对makeMdaStruct的C调用需要60us,而对makeMdaStruct的python调用需要3us,这非常令人困惑。

我最好的猜测是,C代码在第一次调用时付出了加载库的费用?这使我感到困惑,因为我认为在程序加载到内存时加载了库。

编辑:我认为猜测可能有一个真实的内核,因为我在定时调用makeMdaStruct之前对makeMdaStruct和freeMdaStruct进行了额外的不定时调用,并在测试中获得了以下输出: / p>

[]$ ./tester
make struct, took: 1 microseconds
set data, took: 1 microseconds
free struct, took: 0 microseconds
[]$ python so_py_tester.py 
load library, took 70 microseconds
make struct, took 4 microseconds
set data, took 23 microseconds
free struct, took 12 microseconds

1 个答案:

答案 0 :(得分:4)

  

我最好的猜测是,C代码在第一次调用时付出了加载库的费用?这使我感到困惑,因为我认为在程序加载到内存时加载了库。

在这两种情况下你都是对的。加载程序时,库已加载。但是,动态加载器/链接器将符号解析推迟到函数调用时间。

通过过程链接表(PLT)中的条目间接调用共享库。最初,PLT中的所有条目都指向ld.so。在第一次调用函数时,ld.so查找符号的实际地址,更新PLT中的条目,然后跳转到该函数。这是"懒惰"符号解析。

您可以设置LD_BIND_NOW环境变量来更改此行为。来自ld.so(8)

  

LD_BIND_NOW                 (libc5;自2.1.1以来的glibc)如果设置为非空字符串,则导致动态链接器在程序启动时解析所有符号而不是延迟函数调用                 解析到第一次被引用时的点。这在使用调试器时很有用。

此行为也可以在链接时更改。来自ld(1)

  -z keyword
      The recognized keywords are:
      ...
      lazy
           When generating an executable or shared library, mark it to
           tell the dynamic linker to defer function call resolution to
           the point when the function is called (lazy binding), rather
           than at load time.  Lazy binding is the default.

进一步阅读: