我有一个列表rods
,它由length
和position
的元组组成。
对于给定的position
,length
始终是唯一的。我想找到最频繁的杆长度,然后是所有独特杆(通过position
)相邻杆(包括最常见的杆)的总出现次数。细分:
length
。length
的杆,但是,如果它们具有独特的位置 - 不是已经考虑到的(通过'大多数'通过满足相邻标准,在原始组中经常使用“棒”,或者通过“新棒”添加到该组中。 我能够通过排序和使用集合以下列方式完成此任务,但也许有更好的解决方案:
import itertools
#tuples of (length, position)
rods = [(18, 21), (17, 2), (15, 3), (14, 21), (14, 5), (13, 6), (13, 7),
(13, 8), (13, 9), (13, 10), (13, 11), (13, 12), (13, 13), (13, 14),
(13, 15), (13, 16), (13, 17), (13, 18), (13, 19), (13, 20), (13, 21),
(13, 22), (13, 23), (13, 24), (13, 25), (13, 26), (12, 5), (12, 21),
(12, 2)]
lengths = [length for length, position in rods]
#gives tuples of lengths and their frequencies:
length_freq = (sorted([(k,len(list(j))) for k,j in itertools.groupby(sorted(lengths))],
key=lambda x: x[1],reverse=1))
best_length = length_freq[0][0]
#cumulative frequency of rods near best_length, with unique position:
tally = (len(set((best_length,v) for j,v in rods
if best_length - 1 <= j <=best_length + 1)))
print length_freq
#output:
#[(13, 21), (12, 3), (14, 2), (15, 1), (17, 1), (18, 1)]
print tally
#output:
#23
注意23
是此测试数据的正确答案。由于length= 14
的两根杆都位于由length=15
(位置21
和5
)的杆所占据的点上。 position=21
lengths 13 and 12
也存在重叠。
答案 0 :(得分:2)
我认为你的整体是一个合理的解决方案,如果有点过度压缩。我的主要建议是将其分解一点。另外,不要在此使用groupby
,最好尽可能使用Counter
,否则使用defaultdict
。 groupby
用于对预先排序的素材进行延迟操作;如果它没有预先排序,你不需要它是懒惰的,你可能不应该使用它。
由于Nolen Royalty提供了基于defaultdict
的解决方案,我将在此处使用Counter
,但请参阅下面的插入式替换。结果是O(n)算法;既然你的分类,你的是O(n log n),所以这是一个小小的改进。
import collections
#tuples of (length, position)
rods = [(18, 21), (17, 2), (15, 3), (14, 21), (14, 5), (13, 6), (13, 7),
(13, 8), (13, 9), (13, 10), (13, 11), (13, 12), (13, 13), (13, 14),
(13, 15), (13, 16), (13, 17), (13, 18), (13, 19), (13, 20), (13, 21),
(13, 22), (13, 23), (13, 24), (13, 25), (13, 26), (12, 5), (12, 21),
(12, 2)]
lengths = (length for length, position in rods)
length_freq = collections.Counter(lengths)
((best_length, _),) = length_freq.most_common(1)
print best_length
#cumulative frequency of rods near best_length, with unique position:
rod_filter = ((l, p) for l, p in rods if best_length - 1 <= l <= best_length + 1)
tally = len(set((best_length, p) for l, p in rod_filter))
print length_freq
print tally
由于您无法使用Counter
,为了完整性,这里有一个替代方案。这是这两条线的直接替代品:
length_freq = collections.Counter(lengths)
((best_length, _),) = length_freq.most_common(1)
只需用以下内容替换它们:
length_freq = collections.defaultdict(int)
for l in lengths:
length_freq[l] += 1
best_length = max(length_freq, key=length_freq.get)
另请注意我之前的代码有错误;现在已经修好了。
答案 1 :(得分:1)
这是一个非常简单的方法,对我来说似乎很合理:
>>> from collections import defaultdict
>>> rods = [(18, 21), (17, 2), (15, 3), (14, 21), (14, 5), (13, 6), (13, 7),
... (13, 8), (13, 9), (13, 10), (13, 11), (13, 12), (13, 13), (13, 14),
... (13, 15), (13, 16), (13, 17), (13, 18), (13, 19), (13, 20), (13, 21),
... (13, 22), (13, 23), (13, 24), (13, 25), (13, 26), (12, 5), (12, 21),
... (12, 2)]
>>> neighbor_cutoff = 1
>>> length_to_count = defaultdict(int)
>>> neighbors_for_length = defaultdict(set)
>>> for rod in rods:
... length_to_count[rod[0]] += 1
... neighbors_for_length[rod[0]].add(rod[1])
... for i in range(1, neighbor_cutoff+1):
... neighbors_for_length[rod[0]-i].add(rod[1])
... neighbors_for_length[rod[0]+i].add(rod[1])
...
>>> sorted([(length, length_to_count[length]) for length in length_to_count], key=lambda x: x[1], reverse=True)
[(13, 21), (12, 3), (14, 2), (15, 1), (17, 1), (18, 1)]
>>> [(length, len(neighbors_for_length[length])) for length in neighbors_for_length]
[(11, 3), (12, 23), (13, 23), (14, 23), (15, 3), (16, 2), (17, 2), (18, 2), (19, 1)]
>>> sorted(_, key=lambda x: x[1], reverse=True)
[(12, 23), (13, 23), (14, 23), (11, 3), (15, 3), (16, 2), (17, 2), (18, 2), (19, 1)]
>>> neighbors_for_length
defaultdict(<type 'set'>, {11: set([2, 5, 21]), 12: set([2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]),
13: set([2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]),
14: set([3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]),
15: set([3, 21, 5]), 16: set([2, 3]), 17: set([2, 21]), 18: set([2, 21]), 19: set([21])})