计算从特定变量到注释者的过渡

时间:2019-03-26 15:20:52

标签: python

我的问题与这篇文章非常相似: Python : count the number of changes of numbers

但是由于我现在还不能发表评论,所以我想知道是否有更快的方法?

我的代码与链接中的代码基本相同,但是i和j的范围要大得多(总计约一百万),这意味着要花费大量时间(超过一天!)

1 个答案:

答案 0 :(得分:1)

绝对最好将所有过渡计数保存到数据结构中,而不是对每个过渡的外观进行计数。可能是这样的:

def count_transitions(numbers):
    n = max(numbers)
    transitions = [[0] * (n + 1) for _ in range(n + 1)]
    for i in range(len(numbers) - 1):
        n1 = numbers[i]
        n2 = numbers[i + 1]
        transitions[n1][n2] += 1
    return transitions

如何使用它的示例:

test_data = [1, 0, 1, 0, 1, 2, 0, 2, 0, 1, 1]
test_result = count_transitions(test_data)
for i, row in enumerate(test_result):
    for j, count in enumerate(row):
        print(f'{i} -> {j}: {count}')

输出:

0 -> 0: 0
0 -> 1: 3
0 -> 2: 1
1 -> 0: 2
1 -> 1: 1
1 -> 2: 1
2 -> 0: 2
2 -> 1: 0
2 -> 2: 0

现在,另一件事是使速度加快。该算法应该已经快得多了,因为它具有线性复杂度而不是三次,但是我们可以使用一些工具来使其变得更好。例如,使用NumPy可以像这样:

import numpy as np

def count_transitions_np(numbers):
    numbers = np.asarray(numbers)
    n = numbers.max()
    transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
    np.add.at(transitions, (numbers[:-1], numbers[1:]), 1)
    return transitions

或者您可以将Numba与以下内容配合使用:

@nb.njit
def count_transitions_nb(numbers):
    n = 0
    for num in numbers:
        n = max(num, n)
    transitions = np.zeros((n + 1, n + 1), dtype=np.int32)
    for i in range(len(numbers) - 1):
        n1 = numbers[i]
        n2 = numbers[i + 1]
        transitions[n1, n2] += 1
    return transitions

最后,还有一个选择是使用SciPy构建sparse matrix。请注意,这与密集矩阵不同,但是您也可以使用它。

import numpy as np
import scipy.sparse

def count_transitions_sp(numbers):
    numbers = np.asarray(numbers)
    n = numbers.max()
    v = np.ones(len(numbers) - 1, dtype=np.int32)
    return scipy.sparse.coo_matrix((v, (numbers[:-1], numbers[1:])), (n + 1, n + 1))

现在是一个小的基准:

import random

# Generate input data
random.seed(100)
numbers = [random.randint(0, 1000) for _ in range(1000000)]

# Check results are correct
result1 = count_transitions(numbers)
result2 = count_transitions_np(numbers).tolist()
result3 = count_transitions_nb(numbers).tolist()
result4 = count_transitions_sp(numbers).todense().tolist()
print(result1 == result2)
# True
print(result1 == result3)
# True
print(result1 == result4)
# True

# NumPy version of data for NumPy, Numba and SciPy
numbers_np = np.asarray(numbers)
# Time it with IPython
%timeit count_transitions(numbers)
# 178 ms ± 633 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_np(numbers_np)
# 80.7 ms ± 663 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit count_transitions_nb(numbers_np)
# 5.36 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit count_transitions_sp(numbers_np)
# 4.05 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

如您所见,Numba可以非常快,如果可以使用稀疏矩阵,它们也可以快速构建。