我正在寻找能够映射属于长度n
列表的子列表中所有元素之间所有关系的算法。
更具体地说,假设a
,b
,c
,d
,e
和f
是工人的名字,子列表代表一个“转移”。发生在昨天。我想知道,对于每个与昨天一起工作的工人来说。
shifts_yesterday = [[a, b, c, d], [b, c, e, f]]
目标:
a: b, c, d
b: a, c, d, e, f
c: a, b, d, e, f
d: a, b, c
e: b, c, f
f: b, c, e
上面,我可以看到a
昨天与b, c, d
合作; b
昨天与a, c, d, e, f
等合作
时间复杂性是一个问题,因为我有一个大的列表要处理。 虽然,直觉上,我怀疑这个地板有相当高的底线......
注意:我显然可以只用for
循环编写线性搜索直接方法,但这是(a)不是很聪明(b)非常慢。
这里(一个混乱的)尝试:
shifts = [['a', 'b', 'c', 'd'], ['b', 'c', 'e', 'f']]
workers = [i for s in shifts for i in s]
import collections
d = collections.defaultdict(list)
for w in workers:
for s in shifts:
for i in s:
if i != w and w in s:
if w in d.keys():
if i not in d[w]:
d[w].append(i)
else:
d[w].append(i)
测试:
for k, v in collections.OrderedDict(sorted(d.items())).items():
print(k, v)
时间:
我的:%%timeit -r 10
- > 10000 loops, best of 10: 19 µs per loop
Padraic Cunningham:%%timeit -r 10
- > 100000 loops, best of 10:
4.89 µs per loop
zvone:%%timeit -r 10
- > 100000 loops, best of 10: 3.88 µs per
loop
气动:%%timeit -r 10
- > 10000 loops, best of 10: 33.5 µs per loop
答案 0 :(得分:3)
result = defaultdict(set)
for shift in shifts:
for worker in shift:
result[worker].update(shift)
# now, result[a] contains: a, b, c, d - so remove the a
for k, v in result.iteritems():
v.remove(k)
答案 1 :(得分:2)
使用存储值的集合和 itertools.combinations 来配置工作人员的简化且更有效的自己代码版本:
shifts = [['a', 'b', 'c', 'd'], ['b', 'c', 'e', 'f']]
from itertools import combinations
import collections
d = collections.defaultdict(set)
for sub in shifts:
for a, b in combinations(sub, 2):
d[a].add(b)
d[b].add(a)
for k, v in sorted(d.items()):
print(k, v)
哪会给你:
('a', set(['c', 'b', 'd']))
('b', set(['a', 'c', 'e', 'd', 'f']))
('c', set(['a', 'b', 'e', 'd', 'f']))
('d', set(['a', 'c', 'b']))
('e', set(['c', 'b', 'f']))
('f', set(['c', 'b', 'e']))
在您的小样本输入上:
In [1]: import collections
In [2]: %%timeit
...: shifts = [['a', 'b', 'c', 'd'], ['b', 'c', 'e', 'f']]
...: workers = [i for s in shifts for i in s]
...: d = collections.defaultdict(list)
...: for w in workers:
...: for s in shifts:
...: for i in s:
...: if i != w and w in s:
...: if w in d.keys():
...: if i not in d[w]:
...: d[w].append(i)
...: else:
...: d[w].append(i)
...:
10000 loops, best of 3: 21.6 µs per loop
In [3]: from itertools import combinations
In [4]: %%timeit
...: shifts = [['a', 'b', 'c', 'd'], ['b', 'c', 'e', 'f']]
...: d = collections.defaultdict(set)
...: for sub in shifts:
...: for a, b in combinations(sub, 2):
...: d[a].add(b)
...: d[b].add(a)
...:
100000 loops, best of 3: 4.55 µs per loop
答案 2 :(得分:1)
伪码算法:
declare two-dimensional array workers
for each shift in shifts_yesterday
for each element x in shift
add x to workers[x]
for each element y != x in shift
add y to workers[x]
for each list xs in workers
print xs[0] + ": "
for each element w in xs except the first
print xs[w] + ", "
时间复杂度为O(n*m^2 + w*m)
,其中n
是班次数,m
是任意班次中的最大工人数,w
是工人总数。如果你能够满足于看到每个工人一次(不要同时显示a: b
和b: a
),你可以削减一个m
。这是一个二次算法,我相信这是你能做到的最好的。
答案 3 :(得分:1)
应该指定更多条件。例如,如果总共“shifting_yesterday”数组大小限制为64,则可以使用long类型为worker存储shift-bit。然后你可以通过单一操作回答这个问题:
a = 00000001
b = 00000011
d = 00000010
f = 00000010
b可以用d吗?
((b & d) != 0) : true
与f一起工作吗?
((a & f) != 0) : false
答案 4 :(得分:1)
我认为你正在寻找一套固定的会员关系。我们称之为coworkers
:
shifts_yesterday = [['a', 'b', 'c', 'd'], ['b', 'c', 'e', 'f']]
def coworkers(worker, shifts):
coworkers = set()
coworkers.update( *[shift for shift in shifts if worker in shift] )
return coworkers
对于每个工人,您创建一组包含工人的所有班次。
everybody = set()
everybody.update( *shifts_yesterday )
for worker in everybody:
print("{}: {}".format(worker, coworkers(worker, shifts_yesterday)))
输出
a: set(['a', 'c', 'b', 'd'])
c: set(['a', 'c', 'b', 'e', 'd', 'f'])
b: set(['a', 'c', 'b', 'e', 'd', 'f'])
e: set(['c', 'b', 'e', 'f'])
d: set(['a', 'c', 'b', 'd'])
f: set(['c', 'b', 'e', 'f'])