在Python中基于搜索词对字符串进行分区?

时间:2017-07-26 10:06:33

标签: python

给出一个字符串:

x = 'foo test1 test1 foo test2 foo'  

我想通过foo对字符串进行分区,以便我得到以下内容:

['foo', 'test1 test1 foo', 'test2 foo'] (preferred)

                 or

[['foo'], ['test1', 'test1', 'foo'], ['test2', 'foo']]  (not preferred, but workable)

我试过itertools.groupby

In [1209]: [list(v) for _, v in itertools.groupby(x.split(), lambda k: k != 'foo')]
Out[1209]: [['foo'], ['test1', 'test1'], ['foo'], ['test2'], ['foo']]

但它并没有完全给我我正在寻找的东西。我知道我可以使用循环并执行此操作:

In [1210]: l = [[]]
      ...: for v in x.split():
      ...:     l[-1].append(v)
      ...:     if v == 'foo':
      ...:         l.append([])
      ...:     

In [1211]: l
Out[1211]: [['foo'], ['test1', 'test1', 'foo'], ['test2', 'foo'], []]

但最终留空列表效率不高。有更简单的方法吗?

我想保留分隔符。

5 个答案:

答案 0 :(得分:3)

也许不是最漂亮的方法,但简洁明了:

[part + 'foo' for part in g.split('foo')][:-1]

输出:

['foo', ' test1 test1 foo', ' test2 foo']

答案 1 :(得分:3)

您可以在案件中使用str.partition

def find_foo(x):
    result = []
    while x:
        before, _, x = x.partition("foo")
        result.append(before + "foo")
    return result

>>> find_foo('foo test1 test1 foo test2 foo')
>>> ['foo', ' test1 test1 foo', ' test2 foo']

答案 2 :(得分:1)

您是否考虑过迭代字符串并使用搜索的起始位置?这通常会比你去的时候更快地切断弦。这可能适合你:

x = 'foo test1 test1 foo test2 foo'  

def findall(target, s):
    lt =len(target)
    ls = len(s)
    pos = 0
    result = []
    while pos < ls:
        fpos = s.find(target, pos)+lt
        result.append(s[pos:fpos])
        pos = fpos
    return result

print(findall("foo", x))

答案 3 :(得分:1)

您可以使用正面(?<=)正则表达式背后的外观,如

In [515]: string = 'foo test1 test1 foo test2 foo'

In [516]: re.split('(?<=foo)\s', string)
Out[516]: ['foo', 'test1 test1 foo', 'test2 foo']

In [517]: [x.split() for x in re.split('(?<=foo)\s', string)]
Out[517]: [['foo'], ['test1', 'test1', 'foo'], ['test2', 'foo']]

答案 4 :(得分:0)

试试这个

x = 'foo test1 test1 foo test2 foo'  

word = 'foo'
out = []
while word in x:
    pos = x.index(word)
    l = len(word)
    out.append( x[:int(pos)+l])
    x = x[int(pos)+l:]

print out

输出

['foo', ' test1 test1 foo', ' test2 foo']