我的问题与这篇文章非常相似: Python : count the number of changes of numbers
但是由于我现在还不能发表评论,所以我想知道是否有更快的方法?
我的代码与链接中的代码基本相同,但是i和j的范围要大得多(总计约一百万),这意味着要花费大量时间(超过一天!)
答案 0 :(得分:1)
绝对最好将所有过渡计数保存到数据结构中,而不是对每个过渡的外观进行计数。可能是这样的:
def count_transitions(numbers):
n = max(numbers)
transitions = [[0] * (n + 1) for _ in range(n + 1)]
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1][n2] += 1
return transitions
如何使用它的示例:
test_data = [1, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1]
test_result = count_transitions(test_data)
for i, row in enumerate(test_result):
for j, count in enumerate(row):
print(f'{i} -> {j}: {count}')
输出:
0 -> 0: 0
0 -> 1: 3
0 -> 2: 1
1 -> 0: 2
1 -> 1: 1
1 -> 2: 1
2 -> 0: 2
2 -> 1: 0
2 -> 2: 0
现在,另一件事是使速度加快。该算法应该已经快得多了,因为它具有线性复杂度而不是三次,但是我们可以使用一些工具来使其变得更好。例如,使用NumPy可以像这样:
import numpy as np
def count_transitions_np(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
np.add.at(transitions, (numbers[:-1], numbers[1:]), 1)
return transitions
或者您可以将Numba与以下内容配合使用:
@nb.njit
def count_transitions_nb(numbers):
n = 0
for num in numbers:
n = max(num, n)
transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
for i in range(len(numbers) - 1):
n1 = numbers[i]
n2 = numbers[i + 1]
transitions[n1, n2] += 1
return transitions
最后,还有一个选择是使用SciPy构建sparse matrix。请注意,这与密集矩阵不同,但是您也可以使用它。
import numpy as np
import scipy.sparse
def count_transitions_sp(numbers):
numbers = np.asarray(numbers)
n = numbers.max()
v = np.ones(len(numbers) - 1, dtype=np.int32)
return scipy.sparse.coo_matrix((v, (numbers[:-1], numbers[1:])), (n + 1, n + 1))
现在是一个小的基准:
import random
# Generate input data
random.seed(100)
numbers = [random.randint(0, 1000) for _ in range(1000000)]
# Check results are correct
result1 = count_transitions(numbers)
result2 = count_transitions_np(numbers).tolist()
result3 = count_transitions_nb(numbers).tolist()
result4 = count_transitions_sp(numbers).todense().tolist()
print(result1 == result2)
# True
print(result1 == result3)
# True
print(result1 == result4)
# True
# NumPy version of data for NumPy, Numba and SciPy
numbers_np = np.asarray(numbers)
# Time it with IPython
%timeit count_transitions(numbers)
# 178 ms ± 633 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_np(numbers_np)
# 80.7 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_nb(numbers_np)
# 5.36 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count_transitions_sp(numbers_np)
# 4.05 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
如您所见,Numba可以非常快,如果可以使用稀疏矩阵,它们也可以快速构建。