我想对列表中重复值之间的所有间隔进行直方图。我编写了一些有效的代码,但是它使用if语句的for循环。我经常发现如果可以设法使用聪明的切片和/或预定义的python(numpy)方法编写版本,那么可以获得比使用for循环更快的Python代码,但在这种情况下我无法想到任何方式这样做。任何人都可以建议更快或更pythonic的方式吗?
# make a 'histogram'/count of all the intervals between repeated values
def hist_intervals(a):
values = sorted(set(a)) # get list of which values are in a
# setup the dict to hold the histogram
hist, last_index = {}, {}
for i in values:
hist[i] = {}
last_index[i] = -1 # some default value
# now go through the array and find intervals
for i in range(len(a)):
val = a[i]
if last_index[val] != -1: # do nothing if it's the first time
interval = i - last_index[val]
if interval in hist[val]:
hist[val][interval] += 1
else:
hist[val][interval] = 1
last_index[val] = i
return hist
# example list/array
a = [1,2,3,1,5,3,2,4,2,1,5,3,3,4]
histdict = hist_intervals(a)
print("histdict = ",histdict)
# correct answer for this example
answer = { 1: {3:1, 6:1},
2: {2:1, 5:1},
3: {1:1, 3:1, 6:1},
4: {6:1},
5: {6:1}
}
print("answer = ",answer)
示例输出:
histdict = {1: {3: 1, 6: 1}, 2: {5: 1, 2: 1}, 3: {3: 1, 6: 1, 1: 1}, 4: {6: 1}, 5: {6: 1}}
answer = {1: {3: 1, 6: 1}, 2: {2: 1, 5: 1}, 3: {1: 1, 3: 1, 6: 1}, 4: {6: 1}, 5: {6: 1}}
^注意:我不关心dict中的顺序,所以这个解决方案是可以接受的,但我希望能够在真正的大型数组/列表上运行,我怀疑我目前的方法会很慢。
答案 0 :(得分:1)
您可以通过精心构造的defaultdict
消除设置循环。然后,你只需要在输入列表上进行一次扫描,就可以了。在这里,我将结果defaultdict
更改回常规Dict[int, Dict[int, int]]
,但这样就可以很好地打印出来。
from collections import defaultdict
def count_intervals(iterable):
# setup
last_seen = {}
hist = defaultdict(lambda: defaultdict(int))
# The actual work
for i, x in enumerate(iterable):
if x in last_seen:
hist[x][i-last_seen[x]] += 1
last_seen[x] = i
return hist
a = [1,2,3,1,5,3,2,4,2,1,5,3,3,4]
hist = count_intervals(a)
for k, v in hist.items():
print(k, dict(v))
# 1 {3: 1, 6: 1}
# 3 {3: 1, 6: 1, 1: 1}
# 2 {5: 1, 2: 1}
# 5 {6: 1}
# 4 {6: 1}
答案 1 :(得分:0)
在数据结构方面有一个明显的变化。而不是使用hist
的词典字典使用defaultdict
Counter
这使代码成为
from collections import defaultdict, Counter
# make a 'histogram'/count of all the intervals between repeated values
def hist_intervals(a):
values = sorted(set(a)) # get list of which values are in a
# setup the dict to hold the histogram
hist, last_index = defaultdict(Counter), {}
# now go through the array and find intervals
for i, val in enumerate(a):
if val in last_index
interval = i - last_index[val]
hist[val].update((interval,))
last_index[val] = i
return hist
这会更快,因为if
用C语言编写,也会更清晰。