更快的Python技术可从彼此成倍的数字列表中计算三元组

时间:2018-08-11 23:25:42

标签: python performance list-comprehension

假设我们有一个数字列表,l。我需要对l(l_i,l_j,l_k)中所有长度为3的元组进行计数,以使l_i均匀地划分l_j,而l_j均匀地划分l_k。规定索引i,j,k具有关系i<j<k

即;

如果l=[1,2,3,4,5,6],则元组将为[1,2,6], [1,3,6],[1,2,4],因此COUNT将为3。

如果l=[1,1,1],则唯一的元组将是[1,1,1],因此COUNT将是1。

使用列表推导功能,这是我到目前为止所做的事情:

def myCOUNT(l):
    newlist=[[x,y,z] for x in l for y in l for z in l if (z%y==0 and y%x==0 and l.index(x)<l.index(y) and l.index(y)<l.index(z))]
    return len(newlist)

>>>l=[1,2,3,4,5,6]
>>>myCOUNT(l)
3

这可以工作,但是随着l变长(可能长达2000个元素),花费的时间增加了太多。有更快/更好的方法吗?

5 个答案:

答案 0 :(得分:16)

我们可以通过计算左侧给定数目的因子,右侧计算给该数目的倍数并乘以来计算中间具有给定数字的三元组的数目。对于长度为n的列表,对任何给定的中间元素执行此操作为O(n),对于所有n个可能的中间元素执行此操作为O(n ^ 2)。

def num_triples(l):
    total = 0
    for mid_i, mid in enumerate(l):
        num_left = sum(1 for x in l[:mid_i] if mid % x == 0)
        num_right = sum(1 for x in l[mid_i+1:] if x % mid == 0)
        total += num_left * num_right
    return total

偶然地,您问题中的代码实际上不起作用。它陷入了调用index而不是使用enumerate来获取迭代索引的常见新手陷阱中。当输入中包含重复的元素时,这实际上是错误的,不仅会导致效率低下,还会导致myCOUNT示例输入中的[1, 1, 1]return 0 instead of 1

答案 1 :(得分:4)

查找 O(n 2

中的所有元组

您的算法会迭代所有可能的组合,从而使其成为 O(n 3

相反,您应该预先计算数字列表的除法树,并从树的路径中恢复三元组。

分区树

除法树是一个图形,其中节点是数字,子节点是每个数字的倍数。

例如,给定列表[1, 2, 3, 4],划分树看起来像这样。

   1
  /|\
 2 | 3
  \|
   4

计算除法树需要将每个数字与所有其他数字进行比较,使其创建为 O(n 2

这是划分树的基本实现,可以用于您的问题。

class DivisionTree:
    def __init__(self, values):
        values = sorted(values)

        # For a division-tree to be connected, we need 1 to be its head
        # Thus we artificially add it and note whether it was actually in our numbers
        if 1 in values:
            self._has_one = True
            values = values[1:]
        else:
            self._has_one = False

        self._graph = {1: []}

        for v in values:
            self.insert(v)

    def __iter__(self):
        """Iterate over all values of the division-tree"""
        yield from self._graph

    def insert(self, n):
        """Insert value in division tree, adding it as child of each divisor"""
        self._graph[n] = []

        for x in self:
            if n != x and n % x == 0:
                self._graph[x].append(n)

    def paths(self, depth, _from=1):
        """Return a generator of all paths of *depth* down the division-tree"""
        if _from == 1:
            for x in self._graph[_from]:
                yield from self.paths(depth , _from=x)

        if depth == 1:
            yield [_from]
            return

        if _from != 1 or self._has_one:
            for x in self._graph[_from]:
                for p in self.paths(depth - 1, _from=x):
                    yield [_from, *p]

用法

一旦构建了DivisionTree,就可以遍历图的所有路径并仅选择长度为3的路径。

示例

l = [1,2,3,4,5,6]

dt = DivisionTree(l)

for p in dt.paths(3):
    print(p)

输出

[1, 2, 4]
[1, 2, 6]
[1, 3, 6]

如您的示例所示,此解决方案假定数字列表最初已排序。尽管可以根据索引i < j < k上的条件对输出进行过滤,以提供更通用的解决方案。

时间复杂度

生成划分树是 O(n 2

反过来,最多可以有 n!个不同的路径,尽管每当我们深入到 3 时都停止迭代,将阻止遍历它们。这使我们可以遍历以下路径:

  • 与三个元组相对应的路径,比如说其中有 m

  • 对应于两个元组的路径,其中有 O(n 2

  • 对应于一个元组的路径,其中有 O(n)

因此,这总体产生了算法 O(n 2 + m)

答案 2 :(得分:3)

我想没有列表理解的解决方案会更快(您可以进一步看到带有列表理解的类似物):

a = [1, 2, 3, 4, 5, 6]

def count(a):   
    result = 0
    length = len(a)

    for i in range(length):
        for j in range(i + 1, length):
            for k in range(j + 1, length):
                if a[j] % a[i] == 0 and a[k] % a[j] == 0:
                    result += 1

    return result

print(count(a))

输出

3

在您的解决方案index中,方法太昂贵了(需要O(n)操作)。另外,您不需要遍历每个xyzx = a[i]y = a[j]z = a[k])的完整列表。请注意,我如何在循环中为yz使用索引,因为我知道a.index(x) < a.index(y) < a.index(z)总是很满意。


您也可以将其写为一根衬纸:

def count(a):   
    length = len(a)

    return sum(1 for i in range(length)
               for j in range(i + 1, length)
               for k in range(j + 1, length)
               if a[j] % a[i] == 0 and a[k] % a[j] == 0)

P.S。 请不要在变量名中使用l字母,因为它与1非常相似:)

答案 3 :(得分:1)

有一种方法可以使用itertools combinations

z

由于组合按列表顺序生成元组,因此您无需检查myCOUNT的索引。

然后您的def cnt(li): return sum(1 for x,y,z in combinations(li,3) if z%y==0 and y%x==0) >>> cnt([1,1,1]) 1 >>> cnt([1,2,3,4,5,6]) 3 函数变为:

from itertools import combinations 

class DivisionTree:
    def __init__(self, values):    
        # For a division-tree to be connected, we need 1 to be its head
        # Thus we artificially add it and note whether it was actually in our numbers
        if 1 in values:
            self._has_one = True
            values = values[1:]
        else:
            self._has_one = False

        self._graph = {1: []}

        for v in values:
            self.insert(v)

    def __iter__(self):
        """Iterate over all values of the division-tree"""
        yield from self._graph

    def insert(self, n):
        """Insert value in division tree, adding it as child of each divisor"""
        self._graph[n] = []

        for x in self:
            if n != x and n % x == 0:
                self._graph[x].append(n)

    def paths(self, depth, _from=1):
        """Return a generator of all paths of *depth* down the division-tree"""
        if _from == 1:
            for x in self._graph[_from]:
                yield from self.paths(depth , _from=x)

        if depth == 1:
            yield [_from]
            return

        if _from != 1 or self._has_one:
            for x in self._graph[_from]:
                for p in self.paths(depth - 1, _from=x):
                    yield [_from, *p]


def f1(li):
    return sum(1 for x,y,z in combinations(li,3) if z%y==0 and y%x==0)

def f2(l):
    newlist=[[x,y,z] for x in l for y in l for z in l if (z%y==0 and y%x==0 and l.index(x)<l.index(y) and l.index(y)<l.index(z))]
    return len(newlist)

def f3(a):   
    result = 0
    length = len(a)

    for i in range(length):
        for j in range(i + 1, length):
            for k in range(j + 1, length):
                if a[j] % a[i] == 0 and a[k] % a[j] == 0:
                    result += 1

    return result

def f4(l):
    dt = DivisionTree(l)
    return sum(1 for _ in dt.paths(3))  

def f5(l):
    total = 0
    for mid_i, mid in enumerate(l):
        num_left = sum(1 for x in l[:mid_i] if mid % x == 0)
        num_right = sum(1 for x in l[mid_i+1:] if x % mid == 0)
        total += num_left * num_right
    return total    


if __name__=='__main__':
    import timeit
    tl=list(range(3,155))
    funcs=(f1,f2,f3,f4,f5)
    td={f.__name__:f(tl) for f in funcs}
    print(td)
    for case, x in (('small',50),('medium',500),('large',5000)):
        li=list(range(2,x))
        print('{}: {} elements'.format(case,x))
        for f in funcs:
            print("   {:^10s}{:.4f} secs".format(f.__name__, timeit.timeit("f(li)", setup="from __main__ import f, li", number=1))) 

这是known problem

以下是一些解决方案的时间安排:

{'f1': 463, 'f2': 463, 'f3': 463, 'f4': 463, 'f5': 463}
small: 50 elements
       f1    0.0010 secs
       f2    0.0056 secs
       f3    0.0018 secs
       f4    0.0003 secs
       f5    0.0002 secs
medium: 500 elements
       f1    1.1702 secs
       f2    5.3396 secs
       f3    1.8519 secs
       f4    0.0156 secs
       f5    0.0110 secs
large: 5000 elements
       f1    1527.4956 secs
       f2    6245.9930 secs
       f3    2074.2257 secs
       f4    1.3492 secs
       f5    1.2993 secs

结果:

f1,f2,f3

您可以看到f4,f5显然是O(n ^ 3)或更糟,f2是O(n ^ 2)。 f4花了90多分钟,而f5和{{1}}却花了1.3秒。

答案 4 :(得分:0)

O(M * log(M))中包含正数的排序列表的解决方案

正如user2357112回答的那样,我们可以通过为每个数字计算其因子和倍数的数量来计算O(n ^ 2)中的三元组的数量。但是,如果不比较每个对,而是遍历小于最大数的倍数并检查它们是否在列表中,则可以将效率更改为O(N + M * log(N)),当M为最大时列表中的数字。

代码:

def countTriples(myList):
    counts = {} #Contains the number of appearances of every number.
    factors = {} #Contains the number of factors of every number.
    multiples = {} #Contains the number of multiples of every number.

    for i in myList: #Initializing the dictionaries.
        counts[i] = 0
        factors[i] = 0
        multiples[i] = 0

    maxNum = max(myList) #The maximum number in the list.

    #First, we count the number of appearances of every number.
    for i in myList:
        counts[i] += 1

    #Then, for every number in the list, we check whether its multiples are in the list.
    for i in counts:
        for j in range(2*i, maxNum+1, i):
            if(counts.has_key(j)):
                factors[j] += counts[i]
                multiples[i] += counts[j]

    #Finally, we count the number of triples.
    ans = 0
    for i in counts:
        ans += counts[i]*factors[i]*multiples[i] #Counting triplets with three numbers.
        ans += counts[i]*(counts[i]-1)*factors[i]/2 #Counting triplets with two larger and one smaller number.
        ans += counts[i]*(counts[i]-1)*multiples[i]/2 #Counting triplets with two smaller numbers and one larger number.
        ans += counts[i]*(counts[i]-1)*(counts[i]-2)/6 #Counting triplets with three copies of the same number.

    return ans

虽然此解决方案对于包含许多小数字的列表可以快速运行,但不适用于包含大数字的列表:

countTriples(list(range(1,1000000)) #Took 80 seconds on my computer
countTriples([1,2,1000000000000]) #Will take a very long time

针对未排序列表的效率未知的快速解决方案

计算列表中每个数字的倍数和因数的另一种方法是使用二叉树数据结构,叶子对应于数字。数据结构支持三种操作:

1)在每个位置加一个数字,该数字是数字的倍数。

2)将数字添加到集合中指定的每个位置。

3)获取头寸的值。

我们使用延迟传播,仅在查询期间将更新从根传播到较低的节点。

要查找列表中每个项目的因子数量,请遍历列表,从数据结构中查询当前项目的因子数量,然后将1乘以每个倍数。

要找到每个项目的倍数,我们首先使用先前解决方案中描述的算法,为列表中的每个项目找到其所有因素。

然后我们以相反的顺序遍历列表。对于每一项,我们从数据结构中查询其倍数的数量,并在数据结构中的因子上加1。

最后,对于每个项目,我们将其乘数和倍数添加到答案中。

代码:

'''A tree that supports two operations:
    addOrder(num) - If given a number, adds 1 to all the values which are multiples of the given number. If given a tuple, adds 1 to all the values in the tuple.
    getValue(num) - returns the value of the number.
    Uses lazy evaluation to speed up the algorithm.
'''
class fen:
    '''Initiates the tree from either a list, or a segment of the list designated by s and e'''
    def __init__(this, l, s = 0, e = -1):
        if(e == -1): e = len(l)-1
        this.x1 = l[s]
        this.x2 = l[e]

        this.val = 0
        this.orders = {}

        if(s != e):
            this.s1 = fen(l, s, (s+e)/2)
            this.s2 = fen(l, (s+e)/2+1, e)
        else:
            this.s1 = None
            this.s2 = None

    '''Testing if a multiple of the number appears in the range of this node.'''
    def _numGood(this, num):
        if(this.x2-this.x1+1 >= num): return True
        m1 = this.x1%num
        m2 = this.x2%num
        return m1 == 0 or m1 > m2

    '''Testing if a member of the group appears in the range of this node.'''
    def _groupGood(this, group):
        low = 0
        high = len(group)
        if(this.x1 <= group[0] <= this.x2): return True
        while(low != high-1):
            mid = (low+high)/2;
            if(group[mid] < this.x1): low = mid
            elif(group[mid] > this.x2): high = mid
            else: return True
        return False

    def _isGood(this, val):
        if(type(val) == tuple):
            return this._groupGood(val)
        return this._numGood(val)

    '''Adds an order to this node.'''
    def addOrder(this, num, count = 1):
        if(not this._isGood(num)): return
        if(this.x1 == this.x2): this.val += count
        else :this.orders[num] = this.orders.get(num, 0)+count

    '''Pushes the orders to lower nodes.''' 
    def _pushOrders(this):
        if(this.x1 == this.x2): return
        for i in this.orders:
            this.s1.addOrder(i, this.orders[i])
            this.s2.addOrder(i, this.orders[i])
        this.orders = {}

    def getValue(this, num):
        this._pushOrders()
        if(num < this.x1 or num > this.x2):
            return 0
        if(this.x1 == this.x2):
            return this.val
        return this.s1.getValue(num)+this.s2.getValue(num)

def countTriples2(myList):
    factors = [0 for i in myList]
    multiples = [0 for i in myList]

    numSet = set((abs(i) for i in myList))
    sortedList = sorted(list(numSet))

    #Calculating factors.
    tree = fen(sortedList)
    for i in range(len(myList)):
        factors[i] = tree.getValue(abs(myList[i]))
        tree.addOrder(abs(myList[i]))

    #Calculating the divisors of every number in the group.
    mxNum = max(numSet)
    divisors = {i:[] for i in numSet}

    for i in sortedList:
        for j in range(i, mxNum+1, i):
            if(j in numSet):
                divisors[j].append(i)
    divisors = {i:tuple(divisors[i]) for i in divisors}
    #Calculating the number of multiples to the right of every number.
    tree = fen(sortedList)
    for i in range(len(myList)-1, -1, -1):
        multiples[i] = tree.getValue(abs(myList[i]))
        tree.addOrder(divisors[abs(myList[i])])

    ans = 0
    for i in range(len(myList)):
        ans += factors[i]*multiples[i]

    return ans

此解决方案适用于我的计算机在6秒内包含数字1..10000的列表,以及在87秒内包含1..100000数字的列表。