我有一个排序列表 - 实际上是按x排序的(x,y,z)三元组的巨大数组。 我的目标是根据x的范围将其分解成碎片。我一直在尝试
for triple in hugelist:
while triple[0] >= minx and triple[0] < maxx:
#do some stuff
# when out of that range, increase endpoints to the next range
minx = minx + deltax
maxx = maxx + deltax
# do some other stuff
# and hopefully move to next triple
现在当然不行,因为我误用了,我理解为什么。但是,我无法想象如何通过列表。 hugelist大约有200万个三元组被分成大约600个块。如果可能的话,我希望只按顺序通过一次。
==============================
在Tim的帮助下,使用291点迷你列表,bisect错过了maxx应该去的地方:
while xstart < len(heights):
xfinish = bisect.bisect_left(heights, (maxx, 0, 0), lo=xstart)
xslice = heights[xstart:xfinish]
print "xstart is ", xstart, " xfinish is ", xfinish
print "maxx is ", maxx, " xslice is ", xslice
maxx += deltax
xstart = xfinish
xstart is 0 xfinish is 291
maxx is 804.0 xslice is [(803.01, 1941.84, 0.74) (803.04, 1941.88, 0.45) (803.06, 1941.25, 0.0)
(803.07, 1941.01, 0.0) (803.07, 1941.52, 0.31) (803.09, 1941.16, 0.08)
(803.12, 1940.05, 0.0) (803.13, 1939.72, 0.3) (803.13, 1939.86, 0.11)
(803.13, 1940.29, 0.17) . . . (803.23, 1938.24, 0.2)
(803.23, 1938.25, 0.45) (803.23, 1938.29, 0.1) (803.23, 1938.36, 0.0)
(803.23, 1938.49, 0.0) (803.96, 1941.06, 4.21) (**803.98**, 1940.6, 4.55)
(**804.0**, 1940.32, 4.49) (**804.01**, 1940.68, 4.6) . . . (806.11, 1934.82, 10.64)
(806.11, 1934.86, 10.65) (806.11, 1934.91, 10.56) (806.32, 1933.24, 4.69)]
答案 0 :(得分:2)
这是一种不同的,更有效的方法,利用列表进行排序:
from bisect import bisect_left
istart = 0
while istart < len(hugelist):
ifinish = bisect_left(hugelist, (maxx, 0, 0), lo=istart)
# Now work on the slice hugelist[istart:ifinish].
# It's possible that istart == ifinish, i.e. that the
# slice is empty!
maxx += deltax
istart = ifinish
使用二分查找将减少所需的比较次数。
编辑:来自评论:
如果您认为列表索引指向之间的,则会变得非常清楚 元素,最左边元素的“左边”,以及
len(hugelist)
最右边元素的“右边”。然后bisect_left()
返回 紧接在第一个三元组之前的位置,其第一个元素是> =maxx
。
一个例子真的应该有所帮助:
hugelist = [(0,0,0), (1,0,0), (3,0,0), (4,1,1), (4,2,2), (5,0,0)]
maxx = 0
deltax = 1
istart = 0
while istart < len(hugelist):
ifinish = bisect_left(hugelist, (maxx, 0, 0), lo=istart)
# Now work on the slice hugelist[istart:ifinish].
# It's possible that istart == ifinish, i.e. that the
# slice is empty!
print "for maxx =", maxx, hugelist[istart:ifinish]
maxx += deltax
istart = ifinish
输出:
for maxx = 0 []
for maxx = 1 [(0, 0, 0)]
for maxx = 2 [(1, 0, 0)]
for maxx = 3 []
for maxx = 4 [(3, 0, 0)]
for maxx = 5 [(4, 1, 1), (4, 2, 2)]
for maxx = 6 [(5, 0, 0)]
这主要显示了终结者,这是任何理智的读者都会担心的; - )
答案 1 :(得分:1)
您只需使用if
检查triple[0]
是否在所需范围内。不需要内循环。如果列表按x值排序,则无需与最小值进行比较;检查它是否低于最大值。
for triple in hugelist:
if triple[0] < maxx:
#do some stuff
else:
maxx = maxx + deltax
# do some other stuff
根据您的目的,您还可以查看itertools.groupby。
编辑:如果您在评论中说的目的是获取每个范围内z值的差异,那么您可以执行以下操作:
z_variances = []
z_group = []
maxx = deltax
for x, y, z in huge_list:
if x < maxx:
z_group.append(z)
else:
z_variances.append(var(z_group))
z_group = [z]
maxx += deltax
或使用groupby
:
z_variances = []
for _, group in itertools.groupby(huge_list, lambda x: int(x / deltax)):
z_variances.append(var(z for x, y, z in group))
答案 2 :(得分:1)
首先,创建一个示例numpy数组:
>>> alen=300000
>>> huge=np.arange(alen).reshape(alen/3,3)
>>> huge
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
...,
[299991, 299992, 299993],
[299994, 299995, 299996],
[299997, 299998, 299999]])
此语法将为您提供第一列:
>>> huge[:,0]
array([ 0, 3, 6, ..., 299991, 299994, 299997])
由于您声明子数组已排序,您可以使用numpy.searchsorted将较大的数组分隔为存储桶。
让我们分成三分之一:
>>> minx=huge[-1][0]/3
>>> maxx=huge[-1][0]*2/3
>>> minx
99999
>>> maxx
199998
只需使用np.searchsorted测试你想要的范围内三元组的条件:
>>> np.searchsorted(huge[:,0],[minx,maxx])
array([33333, 66666])
然后将huge
切片到所需的桶中:
>>> buckets=np.searchsorted(huge[:,0],[minx,maxx])
>>> bucket1=huge[0:buckets[0]]
>>> bucket2=huge[buckets[0]:buckets[1]]
>>> bucket3=huge[buckets[1]:]
>>> bucket1
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
...,
[99990, 99991, 99992],
[99993, 99994, 99995],
[99996, 99997, 99998]])
>>> bucket2
array([[ 99999, 100000, 100001],
[100002, 100003, 100004],
[100005, 100006, 100007],
...,
[199989, 199990, 199991],
[199992, 199993, 199994],
[199995, 199996, 199997]])
>>> bucket3
array([[199998, 199999, 200000],
[200001, 200002, 200003],
[200004, 200005, 200006],
...,
[299991, 299992, 299993],
[299994, 299995, 299996],
[299997, 299998, 299999]])
您也可以使用np.histogram:
>>> edges=np.histogram(huge[:,0],[0,minx,maxx,huge[-1][0]])[1]
>>> b1=huge[edges[0]:edges[1]]
>>> b2=huge[edges[1]:edges[2]]
>>> b3=huge[edges[2]:edges[3]]
答案 3 :(得分:0)
如果你想要&#34;最多x&#34;,请使用itertools.takewhile
:
import itertools
li = [(1,2,3),(4,5,6),(7,8,9),(10,11,12),(13,14,15)]
list(itertools.takewhile(lambda x: x[0] < 10,li))
Out[78]: [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
如果您想为整个集合指定群组,那就是itertools.groupby
:
def grouper(x):
if x < 5:
return 0
if x < 11:
return 1
return 2
for i,g in itertools.groupby(li,lambda x: grouper(x[0])):
print('group {}: {}'.format(i,list(g)))
group 0: [(1, 2, 3), (4, 5, 6)]
group 1: [(7, 8, 9), (10, 11, 12)]
group 2: [(13, 14, 15)]