映射列表列表中元素之间所有关系的过程

时间:2016-09-24 13:48:16

标签: python algorithm time-complexity

我正在寻找能够映射属于长度n列表的子列表中所有元素之间所有关系的算法。

更具体地说,假设abcdef是工人的名字,子列表代表一个“转移”。发生在昨天。我想知道,对于每个与昨天一起工作的工人来说。

shifts_yesterday = [[a, b, c, d], [b, c, e, f]] 

目标:

a: b, c, d
b: a, c, d, e, f
c: a, b, d, e, f
d: a, b, c
e: b, c, f
f: b, c, e

上面,我可以看到a昨天与b, c, d合作; b昨天与a, c, d, e, f等合作

时间复杂性是一个问题,因为我有一个大的列表要处理。 虽然,直觉上,我怀疑这个地板有相当高的底线......

注意:我显然可以只用for循环编写线性搜索直接方法,但这是(a)不是很聪明(b)非常慢。

编辑:

这里(一个混乱的)尝试:

shifts = [['a', 'b', 'c', 'd'], ['b', 'c', 'e', 'f']]
workers = [i for s in shifts for i in s]

import collections
d = collections.defaultdict(list)

for w in workers:
    for s in shifts:
        for i in s:
            if i != w and w in s:
                if w in d.keys():
                    if i not in d[w]:
                        d[w].append(i)
                else:
                    d[w].append(i)

测试:

for k, v in collections.OrderedDict(sorted(d.items())).items():
    print(k, v)

编辑2:

时间:

  1. 我的:%%timeit -r 10 - > 10000 loops, best of 10: 19 µs per loop

  2. Padraic Cunningham:%%timeit -r 10 - > 100000 loops, best of 10: 4.89 µs per loop

  3. zvone:%%timeit -r 10 - > 100000 loops, best of 10: 3.88 µs per loop

  4. 气动:%%timeit -r 10 - > 10000 loops, best of 10: 33.5 µs per loop

5 个答案:

答案 0 :(得分:3)

result = defaultdict(set)

for shift in shifts:
    for worker in shift:
        result[worker].update(shift)

# now, result[a] contains: a, b, c, d - so remove the a

for k, v in result.iteritems():
    v.remove(k)

答案 1 :(得分:2)

使用存储值的集合和 itertools.combinations 来配置工作人员的简化且更有效的自己代码版本:

shifts = [['a', 'b', 'c', 'd'], ['b', 'c', 'e', 'f']]


from itertools import combinations
import collections

d = collections.defaultdict(set)
for sub in shifts:
    for a, b in combinations(sub, 2):
        d[a].add(b)
        d[b].add(a)

for k, v in sorted(d.items()):
print(k, v)

哪会给你:

('a', set(['c', 'b', 'd']))
('b', set(['a', 'c', 'e', 'd', 'f']))
('c', set(['a', 'b', 'e', 'd', 'f']))
('d', set(['a', 'c', 'b']))
('e', set(['c', 'b', 'f']))
('f', set(['c', 'b', 'e']))

在您的小样本输入上:

In [1]: import collections

In [2]: %%timeit
   ...: shifts = [['a', 'b', 'c', 'd'], ['b', 'c', 'e', 'f']]
   ...: workers = [i for s in shifts for i in s]
   ...: d = collections.defaultdict(list)
   ...: for w in workers:
   ...:     for s in shifts:
   ...:         for i in s:
   ...:             if i != w and w in s:
   ...:                 if w in d.keys():
   ...:                     if i not in d[w]:
   ...:                         d[w].append(i)
   ...:                 else:
   ...:                     d[w].append(i)
   ...: 
10000 loops, best of 3: 21.6 µs per loop

In [3]: from itertools import combinations

In [4]: %%timeit
   ...: shifts = [['a', 'b', 'c', 'd'], ['b', 'c', 'e', 'f']]
   ...: d = collections.defaultdict(set)
   ...: for sub in shifts:
   ...:     for a, b in combinations(sub, 2):
   ...:         d[a].add(b)
   ...:         d[b].add(a)
   ...: 
100000 loops, best of 3: 4.55 µs per loop

答案 2 :(得分:1)

伪码算法:

declare two-dimensional array workers
for each shift in shifts_yesterday
    for each element x in shift
        add x to workers[x]
        for each element y != x in shift
            add y to workers[x]

for each list xs in workers
    print xs[0] + ": "
    for each element w in xs except the first
        print xs[w] + ", "

时间复杂度为O(n*m^2 + w*m),其中n是班次数,m是任意班次中的最大工人数,w是工人总数。如果你能够满足于看到每个工人一次(不要同时显示a: bb: a),你可以削减一个m。这是一个二次算法,我相信这是你能做到的最好的。

答案 3 :(得分:1)

应该指定更多条件。例如,如果总共“shifting_yesterday”数组大小限制为64,则可以使用long类型为worker存储shift-bit。然后你可以通过单一操作回答这个问题:

a = 00000001  
b = 00000011  
d = 00000010  
f = 00000010

b可以用d吗?

((b & d) != 0) : true

与f一起工作吗?

((a & f) != 0) : false

答案 4 :(得分:1)

我认为你正在寻找一套固定的会员关系。我们称之为coworkers

shifts_yesterday = [['a', 'b', 'c', 'd'], ['b', 'c', 'e', 'f']]

def coworkers(worker, shifts):
    coworkers = set()
    coworkers.update( *[shift for shift in shifts if worker in shift] )
    return coworkers

对于每个工人,您创建一组包含工人的所有班次。

everybody = set()
everybody.update( *shifts_yesterday )

for worker in everybody:
     print("{}: {}".format(worker, coworkers(worker, shifts_yesterday)))

输出

a: set(['a', 'c', 'b', 'd'])
c: set(['a', 'c', 'b', 'e', 'd', 'f'])
b: set(['a', 'c', 'b', 'e', 'd', 'f'])
e: set(['c', 'b', 'e', 'f'])
d: set(['a', 'c', 'b', 'd'])
f: set(['c', 'b', 'e', 'f'])