给定一个数据矩阵,其中离散条目表示为2D numpy数组,我试图计算某些特征(列)的观察频率,仅查看某些实例(矩阵的行)。
在完成一些奇特的切片之后,使用bincount
应用于每个切片时,我可以很容易地使用numpy。使用外部数据结构作为计数累加器,在纯Python中执行此操作是C风格的双循环。
import numpy
import numba
try:
from time import perf_counter
except:
from time import time
perf_counter = time
def estimate_counts_numpy(data,
instance_ids,
feature_ids):
"""
WRITEME
"""
#
# slicing the data array (probably memory consuming)
curr_data_slice = data[instance_ids, :][:, feature_ids]
estimated_counts = []
for feature_slice in curr_data_slice.T:
counts = numpy.bincount(feature_slice)
#
# checking just for the all 0 case:
# this is not stable for not binary datasets TODO: fix it
if counts.shape[0] < 2:
counts = numpy.append(counts, [0], 0)
estimated_counts.append(counts)
return estimated_counts
@numba.jit(numba.types.int32[:, :](numba.types.int8[:, :],
numba.types.int32[:],
numba.types.int32[:],
numba.types.int32[:],
numba.types.int32[:, :]))
def estimate_counts_numba(data,
instance_ids,
feature_ids,
feature_vals,
estimated_counts):
"""
WRITEME
"""
#
# actual counting
for i, feature_id in enumerate(feature_ids):
for instance_id in instance_ids:
estimated_counts[i][data[instance_id, feature_id]] += 1
return estimated_counts
if __name__ == '__main__':
#
# creating a large synthetic matrix, testing for performance
rand_gen = numpy.random.RandomState(1337)
n_instances = 2000
n_features = 2000
large_matrix = rand_gen.binomial(1, 0.5, (n_instances, n_features))
#
# random indexes too
n_sample = 1000
rand_instance_ids = rand_gen.choice(n_instances, n_sample, replace=False)
rand_feature_ids = rand_gen.choice(n_features, n_sample, replace=False)
binary_feature_vals = [2 for i in range(n_features)]
#
# testing
numpy_start_t = perf_counter()
e_counts_numpy = estimate_counts_numpy(large_matrix,
rand_instance_ids,
rand_feature_ids)
numpy_end_t = perf_counter()
print('numpy done in {0} secs'.format(numpy_end_t - numpy_start_t))
binary_feature_vals = numpy.array(binary_feature_vals)
#
#
curr_feature_vals = binary_feature_vals[rand_feature_ids]
#
# creating a data structure to hold the slices
# (with numba I cannot use list comprehension?)
# e_counts_numba = [[0 for val in range(feature_val)]
# for feature_val in
# curr_feature_vals]
e_counts_numba = numpy.zeros((n_sample, 2), dtype='int32')
numba_start_t = perf_counter()
estimate_counts_numba(large_matrix,
rand_instance_ids,
rand_feature_ids,
binary_feature_vals,
e_counts_numba)
numba_end_t = perf_counter()
print('numba done in {0} secs'.format(numba_end_t - numba_start_t))
这是我在运行上述代码时得到的时间:
numpy done in 0.2863295429997379 secs
numba done in 11.55551904299864 secs
我的观点是,当我尝试使用numba应用jit时,我的实现速度更慢,所以我非常怀疑我搞砸了。
答案 0 :(得分:3)
你的功能很慢的原因是因为Numba已经回到对象模式来编译循环。
有两个问题:
estimated_counts[i][data[instance_id, feature_id]]
进入这个:
estimated_counts[i, data[instance_id, feature_id]]
@numba.jit
。如果您不想包含编译时间,请确保在基准测试之前调用该函数一次。通过这些更改,我对Numba的基准测试比NumPy快15%。