假设我有一组~100,000个不同的数字。有些是顺序的,有些则不是。
为了证明这个问题,这些数字的一小部分可能是:
(a){1,2,3,4,5,6,7,8,9,11,13,15,45,46,47,3467}
编写此子集的有效方法如下:
(b)1:9:1,11:15:2,45:47:1,3467
这实际上是python的扩展版本和matlab的切片符号。
我的问题是:如何从前一种类型的列表中有效地获取Python中后一种表示法的列表?
即,给定(a),我如何在Python中有效地获得(b)?
答案 0 :(得分:1)
免责声明:我误解了这个问题,并认为你想从切片表示法转到设定版本,这实际上并没有回答你的问题,但我认为值得留下发布。似乎numpy._r
做了相同(或至少非常相似)的事情。
首先请注意,如果您使用的是python 3.5+ PEP 3132,则可以选择在集合文字中使用*unpacking
方法:
>>> {*range(1,9), *range(11,15,2), *range(45,47), 3467}
{1, 2, 3, 4, 5, 6, 7, 8, 11, 3467, 13, 45, 46}
否则表示法11:15:2
仅在对象上使用__getitem__
或__setitem__
时使用,因此您只需设置一个将生成集合的对象:
def slice_to_range(slice_obj):
assert isinstance(slice_obj, slice)
assert slice_obj.stop is not None, "cannot have stop of None"
start = slice_obj.start or 0
stop = slice_obj.stop
step = slice_obj.step or 1
return range(start,stop,step)
class Slice_Set_Creator:
def __getitem__(self,item):
my_set = set()
for part in item:
if isinstance(part,slice):
my_set.update(slice_to_range(part))
else:
my_set.add(part)
return my_set
slice_set_creator = Slice_Set_Creator()
desired_set = slice_set_creator[1:9:1,11:15:2,45:47:1,3467]
>>> desired_set
{1, 2, 3, 4, 5, 6, 7, 8, 11, 3467, 13, 45, 46}
答案 1 :(得分:1)
我认为我得到了它,但以下代码未经过彻底测试,可能包含错误。
基本上get_partial_slices
会尝试创建partial_slice
个对象,当(已排序)集合中的下一个数字不.fit()
进入切片.end()
时下一个切片开始了。
如果切片中只有1个项目(或2个项目和step!=1
),则表示为单独的数字而不是切片(因此需要yield from current.end()
,因为结束切片可能会导致两个数字而不是一个切片。)
class partial_slice:
"""heavily relied on by get_partial_slices
This attempts to create a slice from repeatedly adding numbers
once a number that doesn't fit the slice is found use .end()
to generate either the slice or the individual numbers"""
def __init__(self, n):
self.start = n
self.stop = None
self.step = None
def fit(self,n):
"returns True if n fits as the next element of the slice (or False if it does not"
if self.step is None:
return True #always take the second element into consideration
elif self.stop == n:
return True #n fits perfectly with current stop value
else:
return False
def add(self, n):
"""adds a number to the end of the slice,
will raise a ValueError if the number doesn't fit"""
if not self.fit(n):
raise ValueError("{} does not fit into the slice".format(n))
if self.step is None:
self.step = n - self.start
self.stop = n+self.step
def to_slice(self):
"return slice(self.start, self.stop, self.step)"
return slice(self.start, self.stop, self.step)
def end(self):
"generates at most 3 items, may split up small slices"
if self.step is None:
yield self.start
return
length = (self.stop - self.start)//self.step
if length>2:
#always keep slices that contain more then 2 items
yield self.to_slice()
return
elif self.step==1 and length==2:
yield self.to_slice()
return
else:
yield self.start
yield self.stop - self.step
def get_partial_slices(set_):
data = iter(sorted(set_))
current = partial_slice(next(data))
for n in data:
if current.fit(n):
current.add(n)
else:
yield from current.end()
current = partial_slice(n)
yield from current.end()
test_case = {1,2,3,4,5,6,7,8,9,11,13,15,45,46,47,3467}
result = tuple(get_partial_slices(test_case))
#slice_set_creator is from my other answer,
#this will verify that the result was the same as the test case.
assert test_case == slice_set_creator[result]
def slice_formatter(obj):
if isinstance(obj,slice):
# the actual slice objects, like all indexing in python, doesn't include the stop value
# I added this part to modify it when printing but not when created because the slice
# objects can actually be used in code if you want (like with slice_set_creator)
inclusive_stop = obj.stop - obj.step
return "{0.start}:{stop}:{0.step}".format(obj, stop=inclusive_stop)
else:
return repr(obj)
print(", ".join(map(slice_formatter,result)))
答案 2 :(得分:1)
最简单的方法是使用numpy的r_[]
语法。所以对于你的例子,它只是:
>>> from numpy import r_
>>>
>>> a = r_[1:10, 11:17:2, 45:48, 3467]
请记住,python切片不包含最后一个数字,并且隐含了x:y:1。这种方法在生产代码中的速度不会像另一种更复杂的解决方案那样快,但它对交互式使用有好处。
你可以看到这给你一个带有你想要的数字的numpy数组:
>>> print(a)
[ 1 2 3 4 5 6 7 8 9 11 13 15 45 46 47
3467]