Question

我试图用cython包装一些c代码。我被困住的地方是在名为“algo.c”的文件中定义的一个函数：

int update_emission_mat(
// input
int* count_data, 
double *bound_state_data_pmf, double *unbound_state_data_pmf,
int silent_states_begin, int nucleosome_present, 
int nucleosome_start, int n_padding_states,
// output matrix and its dimensions
double *emission_mat, int n_obs, int n_states
) {......}

c代码是正确的，因为它之前已经过使用和测试过。然后在“algo.h”中我声明了相同的功能：

int update_emission_mat(
// input
int *count_data, 
double *bound_state_data_pmf, double *unbound_state_data_pmf,
int silent_states_begin, int nucleosome_present, 
int nucleosome_start, int n_padding_states,
// output matrix and its dimensions
double *emission_mat, int obs_len, int n_states
);

另外，要将函数包装到cython中，我在文件中包含以下内容的“algo.pxd”：

cdef extern from "algo.h":
    ...
    int update_emission_mat(
        # input
        int *count_data, 
        double *bound_state_data_pmf, double *unbound_state_data_pmf,
        int silent_states_begin, int nucleosome_present, 
        int nucleosome_start, int n_padding_states,
        # output matrix and its dimensions
        double *emission_mat, int obs_len, int n_states
    )

然后最后，在主cython文件“main.pyx”中，我定义了一个类：

cimport algo
import numpy as np
cimport numpy as np
import cython
cdef class main:
    ... 
    cdef np.ndarray data_emission_matrix
    cdef np.ndarray count_data
    ...
    # in one function of the class, I defined and initialized data_emission_matrix
    cpdef alloc_space(self):
        ...
        cdef np.ndarray[np.double_t, ndim = 2] data_emission_matrix = np.ones((self.n_obs, self.n_states), dtype = np.float64, order = 'C')
        self.data_emission_matrix = data_emission_matrix
        ...

    # in another function, I defined and initialized count_data
    cpdef read_counts_data(self, data_file):
        df = open(data_file, 'r') # data_file only contains a column of integers
        dfc = df.readlines()
        df.close()
        cdef np.ndarray[np.int_t, ndim = 1] count_data = np.array(dfc, dtype = np.int, order = 'C')    
        self.count_data = count_data

    # finally, called the c function
    cpdef update_data_emission_matrix_using_counts_data(self):
        ....

        cdef np.ndarray[np.int, ndim = 1] count_data = self.count_data
        cdef np.ndarray[np.double_t, ndim = 2] data_emission_matrix = \
            self.data_emission_matrix

        cdef int n_padding_states = 5
        algo.update_emission_mat(
            &count_data[0], &bound_state_data_pmf[0], 
            &unbound_state_data_pmf[0], self.silent_states_begin,
            self.nucleosome_present, self.nucleosome_start, 
            n_padding_states, &data_emission_matrix[0,0],
            self.n_obs, self.n_states
            )

我无法编译文件。我得到的错误信息是抱怨取“count_data”的地址：

Error compiling Cython file:
------------------------------------------------------------
...
        self.data_emission_matrix

    cdef int n_padding_states = 5

    algo.update_emission_mat(
        &count_data[0], &bound_state_data_pmf[0],
       ^
------------------------------------------------------------

main.pyx:138:12: Cannot take address of Python variable

我很困惑，因为我基本上以同样的方式对待“data_emission_matrix”但是cython并没有抱怨这一点。我为繁琐的代码道歉。我是cython的新手，无法找出导致错误的确切位置......我感谢任何帮助！

Answer 1

如果确保ndarray为C_CONTIGUOUS，请使用以下代码传递数据缓冲区的地址：

<int *>count_data.data

编辑：

导致错误的真正问题是count_data的元素类型，它应该是：np.int_t。

使用int类型声明numpy数组并将数组指针传递给c代码

1 个答案: