Pythonic方式有“大小安全”切片

时间:2011-11-10 09:53:40

标签: python slice

以下是对https://stackoverflow.com/users/893/greg-hewgillExplain Python's slice notation回复的引用。

  

如果项目少于您的要求,Python对程序员很友好   对于。例如,如果您要求[: - 2]并且只包含一个   你得到一个空列表而不是错误。有时你   我更喜欢这个错误,所以你必须意识到这可能会发生。

因此,当首选错误时,Pythonic的方法是什么?是否有更多的Pythonic方法来重写这个例子?

class ParseError(Exception):
    pass

def safe_slice(data, start, end):
    """0 <= start <= end is assumed"""
    r = data[start:end]
    if len(r) != end - start:
        raise IndexError
    return r

def lazy_parse(data):
    """extract (name, phone) from a data buffer.
    If the buffer could not be parsed, a ParseError is raised.

    """

    try:
        name_length = ord(data[0])
        extracted_name = safe_slice(data, 1, 1 + name_length)
        phone_length = ord(data[1 + name_length])
        extracted_phone = safe_slice(data, 2 + name_length, 2 + name_length + phone_length)
    except IndexError:
        raise ParseError()
    return extracted_name, extracted_phone

if __name__ == '__main__':
    print lazy_parse("\x04Jack\x0A0123456789") # OK
    print lazy_parse("\x04Jack\x0A012345678") # should raise ParseError

编辑:使用字节字符串编写的示例更简单,但我的实际代码是使用列表。

4 个答案:

答案 0 :(得分:5)

这是一种可以说更像Pythonic的方式。如果要解析字节字符串,可以使用为此目的提供的struct模块:

import struct
from collections import namedtuple
Details = namedtuple('Details', 'name phone')

def lazy_parse(data):
    """extract (name, phone) from a data buffer.
    If the buffer could not be parsed, a ParseError is raised.

    """
    try:
        name = struct.unpack_from("%dp" % len(data), data)[0]
        phone = struct.unpack_from("%dp" % (len(data)-len(name)-1), data, len(name)+1)[0]
    except struct.error:
        raise ParseError()
    return Details(name, phone)

我仍然发现unpythonic是关于丢弃有用的struct.error回溯来替换ParseError的原因:原始告诉你字符串有什么问题,后者只告诉你出错了。 / p>

答案 1 :(得分:2)

使用像safe_slice这样的函数比创建一个对象只是为了执行切片要快,但如果速度不是瓶颈并且你正在寻找一个更好的界面,你可以定义一个__getitem__到的类在返回切片之前执行检查。

这允许您使用漂亮的切片表示法,而不必将startstop参数都传递给safe_slice

class SafeSlice(object):
    # slice rules: http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange
    def __init__(self,seq):
        self.seq=seq
    def __getitem__(self,key):
        seq=self.seq
        if isinstance(key,slice):
            start,stop,step=key.start,key.stop,key.step
            if start:
                seq[start]
            if stop:
                if stop<0: stop=len(seq)+stop
                seq[stop-1]
        return seq[key]

seq=[1]
print(seq[:-2])
# []
print(SafeSlice(seq)[:-1])
# []
print(SafeSlice(seq)[:-2])
# IndexError: list index out of range

如果速度是一个问题,那么我建议只测试终点而不是算术。 Python列表的项目访问权限是O(1)。下面的safe_slice版本也允许您传递2,3或4个参数。只有2个参数,第二个将被解释为停止值(类似于range)。

def safe_slice(seq, start, stop=None, step=1):
    if stop is None:
        stop=start
        start=0
    else:
        seq[start]
    if stop<0: stop=len(seq)+stop
    seq[stop-1]        
    return seq[start:stop:step]

答案 2 :(得分:2)

这是一个更加pythonic,更一般的代码重写:

class ParseError(Exception):
    pass

def safe_slice(data, start, end, exc=IndexError):
    """0 <= start <= end is assumed"""
    r = data[start:end]
    if len(r) != end - start:
        raise exc()
    return r

def lazy_parse(data):
    """extract (name, phone) from a data buffer.
    If the buffer could not be parsed, a ParseError is raised."""
    results = []
    ptr = 0
    while ptr < len(data):
        length = ord(data[ptr])
        ptr += 1
        results.append(safe_slice(data, ptr, ptr + length, exc=ParseError))
        ptr += length
    return tuple(results)

if __name__ == '__main__':
    print lazy_parse("\x04Jack\x0A0123456789") # OK
    print lazy_parse("\x04Jack\x0A012345678") # should raise ParseError

大多数更改都在lazy_parse的正文中 - 它现在可以使用多个值而不只是两个,并且整个事物的正确性仍然取决于能够解析出的最后一个元素准确。

此外,我safe_sliceIndexError lazy_parse提出了ParseError lazy_parse,而safe_slice lazy_parse IndexError lazy_parse如果出现错误,则会引发lazy_parse默认为lazy_parse,如果没有传递给它的话。)

最后,def lazy_parse(data): """extract (name, phone) from a data buffer. If the buffer could not be parsed, a ParseError is raised.""" ptr = 0 while ptr < len(data): length = ord(data[ptr]) ptr += 1 result = (safe_slice(data, ptr, ptr + length, ParseError)) ptr += length yield result if __name__ == '__main__': print list(lazy_parse("\x04Jack\x0A0123456789")) # OK print list(lazy_parse("\x04Jack\x0A012345678")) # should raise IndexError 不是 - 它一次处理整个字符串并返回所有结果。 Python中的“懒惰”意味着只做返回下一篇文章所需的东西。在lazy_parse的情况下,这意味着返回名称,然后在稍后的电话中返回电话。只需稍加修改,我们就可以使list()懒惰:

lazy_parse

lazy_parse现在是一个一次返回一件的生成器。请注意,我们必须在主要部分for item in lazy_parse(some_data): result = do_stuff_with(item) make_changes_with(result) ... 的{​​{1}}调用周围放置print,以便为我们提供所有结果,以便打印出来。

根据您正在做的事情,这可能不是理想的方式,因为从错误中恢复会更加困难:

for item in list(lazy_parse(some_data)):
    ...

当引发ParseError时,您可能已经进行了难以或无法撤消的更改。像这样的情况下的解决方案将与我们在main的list部分中所做的相同:

lazy_parse

{{1}}调用完全消耗{{1}}并为我们提供结果列表,如果出现错误,我们会在处理循环中的第一项之前了解它。

答案 3 :(得分:2)

这是一个完整的SafeSlice课程,重复使用https://stackoverflow.com/users/107660/duncanhttps://stackoverflow.com/users/190597/unutbu回答。 该类非常大,因为它具有完整的切片支持(启动,停止和步骤)。对于在示例中完成的简单工作,这可能是过度的,但对于更完整的现实生活问题,它可能证明是有用的。

from __future__ import division
from collections import MutableSequence
from collections import namedtuple
from math import ceil

class ParseError(Exception):
    pass

Details = namedtuple('Details', 'name phone')

def parse_details(data):
    safe_data = SafeSlice(bytearray(data)) # because SafeSlice expects a mutable object
    try:
        name_length = safe_data.pop(0)
        name = safe_data.popslice(slice(name_length))
        phone_length = safe_data.pop(0)
        phone = safe_data.popslice(slice(phone_length))
    except IndexError:
        raise ParseError()
    if safe_data:
        # safe_data should be empty at this point
        raise ParseError()
    return Details(name, phone)

def main():
    print parse_details("\x04Jack\x0A0123456789") # OK
    print parse_details("\x04Jack\x0A012345678") # should raise ParseError

SliceDetails = namedtuple('SliceDetails', 'first last length')

class SafeSlice(MutableSequence):
    """This implementation of a MutableSequence gives IndexError with invalid slices"""
    def __init__(self, mutable_sequence):
        self._data = mutable_sequence

    def __str__(self):
        return str(self._data)

    def __repr__(self):
        return repr(self._data)

    def __len__(self):
        return len(self._data)

    def computeindexes(self, ii):
        """Given a slice or an index, this method computes what would ideally be
        the first index, the last index and the length if the SafeSequence was
        accessed using this parameter.

        None indexes will be returned if the computed length is 0.
        First and last indexes may be negative. This means that they are invalid
        indexes. (ie: range(2)[-4:-3] will return first=-2, last=-1 and length=1)
        """
        if isinstance(ii, slice):
            start, stop, step = ii.start, ii.stop, ii.step
            if start is None:
                start = 0
            elif start < 0:
                start = len(self._data) + start
            if stop is None:
                stop = len(self._data)
            elif stop < 0:
                stop = len(self._data) + stop
            if step is None:
                step = 1
            elif step == 0:
                raise ValueError, "slice step cannot be zero"

            length = ceil((stop - start) / step)
            length = int(max(0, length))
            if length:
                first_index = start
                last_index = start + (length - 1) * step
            else:
                first_index, last_index = None, None
        else:
            length = 1
            if ii < 0:
                first_index = last_index = len(self._data) + ii
            else:
                first_index = last_index = ii
        return SliceDetails(first_index, last_index, length)

    def slicecheck(self, ii):
        """Check if the first and the last item of parameter could be accessed"""
        slice_details = self.computeindexes(ii)
        if slice_details.first is not None:
            if slice_details.first < 0:
                # first is *really* negative
                self._data[slice_details.first - len(self._data)]
            else:
                self._data[slice_details.first]
        if slice_details.last is not None:
            if slice_details.last < 0:
                # last is *really* negative
                self._data[slice_details.last - len(self._data)]
            else:
                self._data[slice_details.last]

    def __delitem__(self, ii):
        self.slicecheck(ii)
        del self._data[ii]

    def __setitem__(self, ii, value):
        self.slicecheck(ii)
        self._data[ii] = value

    def __getitem__(self, ii):
        self.slicecheck(ii)
        r = self._data[ii]
        if isinstance(ii, slice):
            r = SafeSlice(r)
        return r

    def popslice(self, ii):
        """Same as pop but a slice may be used as index."""
        self.slicecheck(ii)
        r = self._data[ii]
        if isinstance(ii, slice):
            r = SafeSlice(r)
        del self._data[ii]
        return r

    def insert(self, i, value):
        length = len(self._data)
        if -length <= i <= length:
            self._data.insert(i, value)
        else:
            self._data[i]

if __name__ == '__main__':
    main()