以下是对https://stackoverflow.com/users/893/greg-hewgill的Explain Python's slice notation回复的引用。
如果项目少于您的要求,Python对程序员很友好 对于。例如,如果您要求[: - 2]并且只包含一个 你得到一个空列表而不是错误。有时你 我更喜欢这个错误,所以你必须意识到这可能会发生。
因此,当首选错误时,Pythonic的方法是什么?是否有更多的Pythonic方法来重写这个例子?
class ParseError(Exception):
pass
def safe_slice(data, start, end):
"""0 <= start <= end is assumed"""
r = data[start:end]
if len(r) != end - start:
raise IndexError
return r
def lazy_parse(data):
"""extract (name, phone) from a data buffer.
If the buffer could not be parsed, a ParseError is raised.
"""
try:
name_length = ord(data[0])
extracted_name = safe_slice(data, 1, 1 + name_length)
phone_length = ord(data[1 + name_length])
extracted_phone = safe_slice(data, 2 + name_length, 2 + name_length + phone_length)
except IndexError:
raise ParseError()
return extracted_name, extracted_phone
if __name__ == '__main__':
print lazy_parse("\x04Jack\x0A0123456789") # OK
print lazy_parse("\x04Jack\x0A012345678") # should raise ParseError
编辑:使用字节字符串编写的示例更简单,但我的实际代码是使用列表。
答案 0 :(得分:5)
这是一种可以说更像Pythonic的方式。如果要解析字节字符串,可以使用为此目的提供的struct
模块:
import struct
from collections import namedtuple
Details = namedtuple('Details', 'name phone')
def lazy_parse(data):
"""extract (name, phone) from a data buffer.
If the buffer could not be parsed, a ParseError is raised.
"""
try:
name = struct.unpack_from("%dp" % len(data), data)[0]
phone = struct.unpack_from("%dp" % (len(data)-len(name)-1), data, len(name)+1)[0]
except struct.error:
raise ParseError()
return Details(name, phone)
我仍然发现unpythonic是关于丢弃有用的struct.error回溯来替换ParseError的原因:原始告诉你字符串有什么问题,后者只告诉你出错了。 / p>
答案 1 :(得分:2)
使用像safe_slice这样的函数比创建一个对象只是为了执行切片要快,但如果速度不是瓶颈并且你正在寻找一个更好的界面,你可以定义一个__getitem__
到的类在返回切片之前执行检查。
这允许您使用漂亮的切片表示法,而不必将start
和stop
参数都传递给safe_slice
。
class SafeSlice(object):
# slice rules: http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange
def __init__(self,seq):
self.seq=seq
def __getitem__(self,key):
seq=self.seq
if isinstance(key,slice):
start,stop,step=key.start,key.stop,key.step
if start:
seq[start]
if stop:
if stop<0: stop=len(seq)+stop
seq[stop-1]
return seq[key]
seq=[1]
print(seq[:-2])
# []
print(SafeSlice(seq)[:-1])
# []
print(SafeSlice(seq)[:-2])
# IndexError: list index out of range
如果速度是一个问题,那么我建议只测试终点而不是算术。 Python列表的项目访问权限是O(1)。下面的safe_slice
版本也允许您传递2,3或4个参数。只有2个参数,第二个将被解释为停止值(类似于range
)。
def safe_slice(seq, start, stop=None, step=1):
if stop is None:
stop=start
start=0
else:
seq[start]
if stop<0: stop=len(seq)+stop
seq[stop-1]
return seq[start:stop:step]
答案 2 :(得分:2)
这是一个更加pythonic,更一般的代码重写:
class ParseError(Exception):
pass
def safe_slice(data, start, end, exc=IndexError):
"""0 <= start <= end is assumed"""
r = data[start:end]
if len(r) != end - start:
raise exc()
return r
def lazy_parse(data):
"""extract (name, phone) from a data buffer.
If the buffer could not be parsed, a ParseError is raised."""
results = []
ptr = 0
while ptr < len(data):
length = ord(data[ptr])
ptr += 1
results.append(safe_slice(data, ptr, ptr + length, exc=ParseError))
ptr += length
return tuple(results)
if __name__ == '__main__':
print lazy_parse("\x04Jack\x0A0123456789") # OK
print lazy_parse("\x04Jack\x0A012345678") # should raise ParseError
大多数更改都在lazy_parse
的正文中 - 它现在可以使用多个值而不只是两个,并且整个事物的正确性仍然取决于能够解析出的最后一个元素准确。
此外,我safe_slice
向IndexError
lazy_parse
提出了ParseError
lazy_parse
,而safe_slice
lazy_parse
IndexError
lazy_parse
如果出现错误,则会引发lazy_parse
默认为lazy_parse
,如果没有传递给它的话。)
最后,def lazy_parse(data):
"""extract (name, phone) from a data buffer.
If the buffer could not be parsed, a ParseError is raised."""
ptr = 0
while ptr < len(data):
length = ord(data[ptr])
ptr += 1
result = (safe_slice(data, ptr, ptr + length, ParseError))
ptr += length
yield result
if __name__ == '__main__':
print list(lazy_parse("\x04Jack\x0A0123456789")) # OK
print list(lazy_parse("\x04Jack\x0A012345678")) # should raise IndexError
不是 - 它一次处理整个字符串并返回所有结果。 Python中的“懒惰”意味着只做返回下一篇文章所需的东西。在lazy_parse
的情况下,这意味着返回名称,然后在稍后的电话中返回电话。只需稍加修改,我们就可以使list()
懒惰:
lazy_parse
lazy_parse
现在是一个一次返回一件的生成器。请注意,我们必须在主要部分for item in lazy_parse(some_data):
result = do_stuff_with(item)
make_changes_with(result)
...
的{{1}}调用周围放置print
,以便为我们提供所有结果,以便打印出来。
根据您正在做的事情,这可能不是理想的方式,因为从错误中恢复会更加困难:
for item in list(lazy_parse(some_data)):
...
当引发ParseError时,您可能已经进行了难以或无法撤消的更改。像这样的情况下的解决方案将与我们在main的list
部分中所做的相同:
lazy_parse
{{1}}调用完全消耗{{1}}并为我们提供结果列表,如果出现错误,我们会在处理循环中的第一项之前了解它。
答案 3 :(得分:2)
这是一个完整的SafeSlice
课程,重复使用https://stackoverflow.com/users/107660/duncan和
https://stackoverflow.com/users/190597/unutbu回答。
该类非常大,因为它具有完整的切片支持(启动,停止和步骤)。对于在示例中完成的简单工作,这可能是过度的,但对于更完整的现实生活问题,它可能证明是有用的。
from __future__ import division
from collections import MutableSequence
from collections import namedtuple
from math import ceil
class ParseError(Exception):
pass
Details = namedtuple('Details', 'name phone')
def parse_details(data):
safe_data = SafeSlice(bytearray(data)) # because SafeSlice expects a mutable object
try:
name_length = safe_data.pop(0)
name = safe_data.popslice(slice(name_length))
phone_length = safe_data.pop(0)
phone = safe_data.popslice(slice(phone_length))
except IndexError:
raise ParseError()
if safe_data:
# safe_data should be empty at this point
raise ParseError()
return Details(name, phone)
def main():
print parse_details("\x04Jack\x0A0123456789") # OK
print parse_details("\x04Jack\x0A012345678") # should raise ParseError
SliceDetails = namedtuple('SliceDetails', 'first last length')
class SafeSlice(MutableSequence):
"""This implementation of a MutableSequence gives IndexError with invalid slices"""
def __init__(self, mutable_sequence):
self._data = mutable_sequence
def __str__(self):
return str(self._data)
def __repr__(self):
return repr(self._data)
def __len__(self):
return len(self._data)
def computeindexes(self, ii):
"""Given a slice or an index, this method computes what would ideally be
the first index, the last index and the length if the SafeSequence was
accessed using this parameter.
None indexes will be returned if the computed length is 0.
First and last indexes may be negative. This means that they are invalid
indexes. (ie: range(2)[-4:-3] will return first=-2, last=-1 and length=1)
"""
if isinstance(ii, slice):
start, stop, step = ii.start, ii.stop, ii.step
if start is None:
start = 0
elif start < 0:
start = len(self._data) + start
if stop is None:
stop = len(self._data)
elif stop < 0:
stop = len(self._data) + stop
if step is None:
step = 1
elif step == 0:
raise ValueError, "slice step cannot be zero"
length = ceil((stop - start) / step)
length = int(max(0, length))
if length:
first_index = start
last_index = start + (length - 1) * step
else:
first_index, last_index = None, None
else:
length = 1
if ii < 0:
first_index = last_index = len(self._data) + ii
else:
first_index = last_index = ii
return SliceDetails(first_index, last_index, length)
def slicecheck(self, ii):
"""Check if the first and the last item of parameter could be accessed"""
slice_details = self.computeindexes(ii)
if slice_details.first is not None:
if slice_details.first < 0:
# first is *really* negative
self._data[slice_details.first - len(self._data)]
else:
self._data[slice_details.first]
if slice_details.last is not None:
if slice_details.last < 0:
# last is *really* negative
self._data[slice_details.last - len(self._data)]
else:
self._data[slice_details.last]
def __delitem__(self, ii):
self.slicecheck(ii)
del self._data[ii]
def __setitem__(self, ii, value):
self.slicecheck(ii)
self._data[ii] = value
def __getitem__(self, ii):
self.slicecheck(ii)
r = self._data[ii]
if isinstance(ii, slice):
r = SafeSlice(r)
return r
def popslice(self, ii):
"""Same as pop but a slice may be used as index."""
self.slicecheck(ii)
r = self._data[ii]
if isinstance(ii, slice):
r = SafeSlice(r)
del self._data[ii]
return r
def insert(self, i, value):
length = len(self._data)
if -length <= i <= length:
self._data.insert(i, value)
else:
self._data[i]
if __name__ == '__main__':
main()