切片列表具有不同的字符串匹配条件

时间:2016-08-25 09:17:14

标签: python indexing list-comprehension slice

我想基于其元素中包含的子串可能来切割字符串列表:

l = ['Some long text', 'often begins', ' with ',
     'impenetrable fog ', 'which ends', ' somewhere further']
startIndex = [u for u, v in enumerate(l) if 'begins' in v)][0]
finalIndex = [u for u, v in enumerate(l) if 'ends' in v)][0]

所以我得到了:

' '.join(l[startIndex:finalIndex]) == 'often begins with impenetrable fog'   

我的主要问题是用于获取索引的开始和结束条件是不同的,并且应该是可变的(如上所述的基本子字符串包含,可能是正则表达式或其他方法)。

可能需要删除第一个和最后一个元素,但我想这是将索引调整为1的问题。 我的代码在理想情况下工作,但由于l的结构和内容不是很容易预测,因此经常会失败。缺少一个或两个匹配条件的元素最终应该是最终字符串为None

理解是相关的,还是映射lambda函数以应用这两个条件?

3 个答案:

答案 0 :(得分:1)

尝试:

l = ['Some long text', 'often begins', 'with', 'impenetrable fog', 'which ends', 'somewhere further']

"""
return the index of the phase in 'phases' if phase contains 'word'
if not found, return 'default'
"""
def index(phases, word, default):
    for i, s in enumerate(phases):
        if word in s: return i
    return default

startIndex = index(l, "long", -1)
finalIndex = index(l, "somewhere", len(l))

print(' '.join(l[startIndex+1:finalIndex]))

答案 1 :(得分:1)

next()

l = ['Some long text', 'often begins', ' with ', 'impenetrable fog ', 
     'which   ends', ' somewhere further']

startIndex = next((u for u, v in enumerate(l) if 'begins' in v), 0)
finalIndex = next((u for u, v in enumerate(l) if 'ends' in v), 0)

if (startIndex and finalIndex) and (finalIndex > startIndex):
    sentence = ' '.join(l[startIndex:finalIndex])
else:
    sentence = None
print(sentence)

与列表理解类似,execpt它不返回列表但是它找到的第一个元素。如果它没有找到任何内容,则会返回一个可选元素(此处为'0'

这样,如果列表中没有'begins'或没有'ends',您就不必打印任何内容。因此,您可以检查'ends'是否位于'begins'之前。

我也喜欢列表理解,但有时你需要的不是列表。

ADVANCE USER的解决方案:

使用两个理解列表的问题是,您从开始检查列表的两倍,并且在开始之前ends出现时它将失败:

l = ['Some long text ends here',  'often begins', ' with ', 'which   ends']
                     ^^^

为避免这种情况,您可以使用send()生成器只在列表中迭代一次。

def get_index(trigger_word):
    for u, v in enumerate(l):
        if trigger_word in v:
            trigger_word = yield u

gen = get_index('begins')
startIndex = gen.send(None)
finalIndex = gen.send('ends')

这里,yield允许您在不退出函数的情况下获取索引。

这样做会更好,但如果列表中没有beginsends,则会出现StopIteration例外情况。为避免这种情况,您可以在yield 0上执行无限循环。现在完整的解决方案将是:

def get_index(l, trigger_word):
    for u, v in enumerate(l):
        if trigger_word in v:
            trigger_word = yield u
    while True:
        yield 0

def concat_with_trigger_words(l):           
    gen = get_index(l, 'begins')
    startIndex = gen.send(None)
    finalIndex = gen.send('ends')
    return ' '.join(l[startIndex:finalIndex]) if (startIndex and finalIndex) else None

# Here some list for free lists for your future unitary tests ;)

l_orignal = ['Some long text here',  'often begins', ' with ', 
             'impenetrable fog ', 'which   ends', ' somewhere further']
l_start_with_ends = ['ends',  'often begins', ' with ', 
                     'impenetrable fog ', 'which   ends', 'begins']
l_none = ['random', 'word']
l_without_begin = ['fog', 'ends here']
l_without_end = ['begins', 'but never' '...']

print(concat_with_trigger_words(l_orignal)) # often begins  with  impenetrable fog 
print(concat_with_trigger_words(l_start_with_ends)) # often begins  with  impenetrable fog 
print(concat_with_trigger_words(l_none)) # None
print(concat_with_trigger_words(l_without_end)) # None
print(concat_with_trigger_words(l_without_begin)) # None

答案 2 :(得分:1)

>>> l = ['Some long text', 'often begins', ' with ',
...      'impenetrable fog ', 'which ends', ' somewhere further']
>>> start, end = 'begins', 'ends'
>>> key_index = {'start': {'word': start, 'index': -1}, 
                 'end': {'word': end, 'index': -1}}
>>> for i, val in enumerate(l):
...     if key_index['start']['word'] in val:
...         key_index['start']['index'] = i
...     elif key_index['end']['word'] in val:
...         key_index['end']['index'] = i
...
>>> start_index, end_index = key_index['start']['index'], key_index['end']['index']
>>> my_list = l[start_index+1:end_index] if start_index >=0 and end_index >= 0 and start_index+1 < end_index else None
>>> my_list
[' with ', 'impenetrable fog ']