我正在尝试扩展我编写的函数,以找到“足够接近”的字典的前3个值也都低于阈值(此处N = 70)。 :
d = {
1: {0: 222, 2:44, 18: 44, 20: 22, 21:72, 105:22, 107:9, 115: 66},
2: {0: 61.0, 993: 65.0, 1133: 84.0, 1069: 48.0, 105:22, 107:9, 115: 24, 214:22, 206:9, 225: 241,412: 83.0, 364: 68.0, 682: 64.0, 172: 58.0}
}#nested dictionary
def ff(d):
G = []
for k,v in sorted(d.iteritems()):
G.append((k,v))
#print G
for i in range(len(G)-2):
if (G[i+2][0] - G[i][0] < 20) & (G[i][1] <= 70) & (G[i+1][1] <=70) & (G[i+2][1]<=70):
return i, G[i], G[i+1], G[i+2]
for idnum, ds in sorted(d.iteritems()):
print ff(ds)
输出:
[(0, 222), (2, 44), (18, 44), (20, 22), (21, 72), (105, 22), (107, 9), (115, 66)]
(1, (2, 44), (18, 44), (20, 22))
[(0, 61.0), (105, 22), (107, 9), (115, 24), (172, 58.0), (206, 9), (214, 22), (225, 241), (364, 68.0), (412, 83.0), (682, 64.0), (993, 65.0), (1069, 48.0), (1133, 84.0)]
(1, (105, 22), (107, 9), (115, 24)) #first interval fitting criteria
我想做的是,实际上找到长度为20的所有窗口,并跟踪它有多少值&lt; = 70。任何关于如何开始的想法都会很棒。我似乎无法弄清楚如何使用“i”移动条件:
if (G[i+2][0] - G[i][0] < 20) & (G[i][1] <= 70) & (G[i+1][1] <=70) & (G[i+2][1]<=70):
基于长度20而不是索引的东西?
最终,而不是“前三个”我想跟踪所有更高的频率,最小值为“至少3个值<= 70,连续订购* ,长度为20区间“。
所需的输出:
如果我们有
d[3] = {0: 61.0, 993: 65.0, 1133: 84.0, 1069: 48.0, 105:22, 107:9, 115: 24, 117:22, 200:100, 225: 241,412: 83.0, 420: 68.0, 423: 64.0, 430: 58.0}
会产生输出:
[(105, 22), (107, 9), (115, 24),(117,22)], [(420, 68.0),(423,63),(430,58)]
# These can be of any length as long as the overall interval of the list is <=20.
答案 0 :(得分:1)
这可能有助于您入门。它是基于循环的,甚至不使用zip(更不用说itertools.takewhile!),但希望有意义:
def find_windows(d, min_elements=3,upper_length=20,max_value=70):
G = sorted(d.items())
for start_index in range(len(G)):
for width in range(min_elements, len(G)-start_index+1):
window = G[start_index:start_index+width]
if not all(v <= max_value for k,v in window):
break
if not window[-1][0] - window[0][0] < upper_length:
break
yield window
我使用“break”因为只要我们有任何值&gt; max_value或我们&gt; = upper_length从start_index开始没有更多可能的窗口。
如果之前没有见过yield
,它会将函数转换为生成函数;它就像一个return
,函数发回(产生)值,然后可以继续而不是停止。 (有关详细信息,请参阅this question的答案。)
>>> Ds = {
... 1: {0: 222, 2:44, 18: 44, 20: 22, 21:72, 105:22, 107:9, 115: 66},
... 2: {0: 61.0, 993: 65.0, 1133: 84.0, 1069: 48.0, 105:22, 107:9, 115: 24, 214:22, 206:9, 225: 241,412: 83.0, 364: 68.0, 682: 64.0, 172: 58.0}
... }
>>>
>>> for idnum, d in sorted(Ds.items()):
... print idnum, list(find_windows(d))
...
1 [[(2, 44), (18, 44), (20, 22)], [(105, 22), (107, 9), (115, 66)]]
2 [[(105, 22), (107, 9), (115, 24)]]
>>> mydict = dict([(0,55),(1,55),(2,55),(3,55)])
>>>
>>> for window in find_windows(mydict):
... print window
...
[(0, 55), (1, 55), (2, 55)]
[(0, 55), (1, 55), (2, 55), (3, 55)]
[(1, 55), (2, 55), (3, 55)]
>>> list(find_windows(mydict))
[[(0, 55), (1, 55), (2, 55)], [(0, 55), (1, 55), (2, 55), (3, 55)], [(1, 55), (2, 55), (3, 55)]]
我仍然不完全清楚你想要对重叠窗口做什么,但是目前它找到了所有这些,你可以在函数内或后处理中决定你想要如何处理它。
将代码修改为而不是测试是否所有值都是&lt; = max_value并改为计算它们应该是微不足道的,所以我将单独留下。
答案 1 :(得分:1)
我把问题分成了两部分。第一个生成器会将您的ds
字典拆分为有序(key, value)
列表,这样每个列表都没有值&gt; 70.与此同时,我丢弃了少于3个项目的块。
def split_iter(d, limit=70):
G = list(sorted(d.iteritems()))
start = 0
for i, (k, v) in enumerate(G):
if v > limit:
if i - start >= 3:
yield G[start:i]
start = i + 1
G_tail = G[start:]
if len(G_tail) >= 3:
yield G_tail
现在我将与bisect_right
模块中的bisect
一起使用,快速找到从每个项目开始的最大可能窗口:
from bisect import bisect_right
def ff(d):
for chunk in split_iter(d):
last_end_i = 0
for i, (k, v) in enumerate(chunk):
end_i = bisect_right(chunk, (k + 20, 0))
if end_i - i < 3:
continue
if last_end_i != end_i:
yield chunk[i:end_i]
last_end_i = end_i
if end_i == len(chunk):
break
如你所见,我只会产生最大可能的窗口。现在我们把它放在一起:
for idnum, ds in sorted(d.iteritems()):
for r in ff(ds):
print idnum, repr(r)
希望我做对了。输出是这样的:
1 [(2, 44), (18, 44), (20, 22)]
1 [(105, 22), (107, 9), (115, 66)]
2 [(105, 22), (107, 9), (115, 24)]