用于迭代的str.split模拟?

时间:2014-02-03 18:20:02

标签: python split itertools iterable

通过itertools模块,我看不到任何可以用作str.split的通用可迭代版本的内容。是否有一种简单,惯用的方法呢?

这些单元测试应该证明我的意思:

class SplitAnalog(unittest.TestCase):
    def test_splitEmpty(self):
        """
        >>> ''.split()
        []
        """
        actual = split(None, [])
        self.assertEqual(tuple(actual), ())

    def test_singleLine(self):
        """
        >>> '123\n'.split()
        ['123']
        """
        actual = split(lambda n: n is None, [1, 2, 3, None])
        self.assertEqual(tuple(tuple(line) for line in actual), ((1, 2, 3),))

    def test_allNones(self):
        """
        >>> '\n\n\n'.split()
        []
        """
        actual = split(lambda n: n is None, [None] * 3)
        self.assertEqual(tuple(actual), ())

    def test_splitNumsOnNone(self):
        """
        >>> '314159\n26535\n89793'.split()
        ['314159', '26535', '89793']
        """
        nums = [3, 1, 4, 1, 5, 9, None, 2, 6, 5, 3, 5, None, 8, 9, 7, 9, 3]
        actual = split(lambda n: n is None, nums)
        self.assertEqual(tuple(tuple(line) for line in actual), (
            (3, 1, 4, 1, 5, 9),
            (2, 6, 5, 3, 5),
            (8, 9, 7, 9, 3)))

    def test_splitNumsOnNine(self):
        nums = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 9, 8, 7, 3]
        actual = split(lambda n: n == 9, nums)
        self.assertEqual(tuple(tuple(line) for line in actual), (
            (3, 1, 4, 1, 5, ),
            (2, 6, 5, 3, 5),
            (8, 7, 3)))

这样的功能会被叫什么?即使我在other language libraries逛了一圈,我也找不到一个例子。

3 个答案:

答案 0 :(得分:1)

假设我明白你在追求什么,也许

def pseudosplit(predicate, seq):
    return (tuple(g) for k,g in groupby(seq, key=lambda x: not predicate(x)) if k)

产生

>>> list(pseudosplit(lambda x: x is None, ()))
[]
>>> list(pseudosplit(lambda x: x is None, [1,2,3]))
[(1, 2, 3)]
>>> list(pseudosplit(lambda x: x is None, [None]*3))
[]
>>> list(pseudosplit(lambda x: x is None, [3, 1, 4, 1, 5, 9, None, 2, 6, 5, 3, 5, None, 8, 9, 7, 9, 3, None]))
[(3, 1, 4, 1, 5, 9), (2, 6, 5, 3, 5), (8, 9, 7, 9, 3)]
无论如何,

似乎与你的测试用例分开了。

答案 1 :(得分:1)

这将基于谓词进行拆分。

def split(predicate, iterable):
    groups = (tuple(g) for k, g in groupby(iterable, predicate))
    return (g for g in groups if not all(imap(predicate, g)))

通过所有测试,包括使用None以外的测试。

def test_splitNumsOnNine(self):
    """
    >>> '314159265359873\n'.split()
    ['31415', '26535', '873']
    """
    nums = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 9, 8, 7, 3]
    actual = split(lambda n: n is 9, nums)
    self.assertEqual(tuple(tuple(line) for line in actual), (
        (3, 1, 4, 1, 5, ),
        (2, 6, 5, 3, 5),
        (8, 7, 3)))

答案 2 :(得分:0)

以下是一个示例实现:

def split(predicate, iterable):
    iterable = iter(iterable)
    line = []
    try:
        while True:
            val = next(iterable)
            if predicate(val):
                if line:
                    yield line
                line = []
            else:
                line.append(val)
    except StopIteration:
        if line:
            yield line

我想知道我是否会忽略一种更简单,更容易,更惯用的方式。任何人吗?