我想基于其元素中包含的子串可能来切割字符串列表:
l = ['Some long text', 'often begins', ' with ',
'impenetrable fog ', 'which ends', ' somewhere further']
startIndex = [u for u, v in enumerate(l) if 'begins' in v)][0]
finalIndex = [u for u, v in enumerate(l) if 'ends' in v)][0]
所以我得到了:
' '.join(l[startIndex:finalIndex]) == 'often begins with impenetrable fog'
我的主要问题是用于获取索引的开始和结束条件是不同的,并且应该是可变的(如上所述的基本子字符串包含,可能是正则表达式或其他方法)。
可能需要删除第一个和最后一个元素,但我想这是将索引调整为1的问题。
我的代码在理想情况下工作,但由于l
的结构和内容不是很容易预测,因此经常会失败。缺少一个或两个匹配条件的元素最终应该是最终字符串为None
。
理解是相关的,还是映射lambda函数以应用这两个条件?
答案 0 :(得分:1)
尝试:
l = ['Some long text', 'often begins', 'with', 'impenetrable fog', 'which ends', 'somewhere further']
"""
return the index of the phase in 'phases' if phase contains 'word'
if not found, return 'default'
"""
def index(phases, word, default):
for i, s in enumerate(phases):
if word in s: return i
return default
startIndex = index(l, "long", -1)
finalIndex = index(l, "somewhere", len(l))
print(' '.join(l[startIndex+1:finalIndex]))
答案 1 :(得分:1)
或next()
:
l = ['Some long text', 'often begins', ' with ', 'impenetrable fog ',
'which ends', ' somewhere further']
startIndex = next((u for u, v in enumerate(l) if 'begins' in v), 0)
finalIndex = next((u for u, v in enumerate(l) if 'ends' in v), 0)
if (startIndex and finalIndex) and (finalIndex > startIndex):
sentence = ' '.join(l[startIndex:finalIndex])
else:
sentence = None
print(sentence)
与列表理解类似,execpt它不返回列表但是它找到的第一个元素。如果它没有找到任何内容,则会返回一个可选元素(此处为'0'
)
这样,如果列表中没有'begins'
或没有'ends'
,您就不必打印任何内容。因此,您可以检查'ends'
是否位于'begins'
之前。
我也喜欢列表理解,但有时你需要的不是列表。
ADVANCE USER的解决方案:
使用两个理解列表的问题是,您从开始检查列表的两倍,并且在开始之前ends
出现时它将失败:
l = ['Some long text ends here', 'often begins', ' with ', 'which ends']
^^^
为避免这种情况,您可以使用send()
生成器只在列表中迭代一次。
def get_index(trigger_word):
for u, v in enumerate(l):
if trigger_word in v:
trigger_word = yield u
gen = get_index('begins')
startIndex = gen.send(None)
finalIndex = gen.send('ends')
这里,yield
允许您在不退出函数的情况下获取索引。
这样做会更好,但如果列表中没有begins
或ends
,则会出现StopIteration例外情况。为避免这种情况,您可以在yield
0上执行无限循环。现在完整的解决方案将是:
def get_index(l, trigger_word):
for u, v in enumerate(l):
if trigger_word in v:
trigger_word = yield u
while True:
yield 0
def concat_with_trigger_words(l):
gen = get_index(l, 'begins')
startIndex = gen.send(None)
finalIndex = gen.send('ends')
return ' '.join(l[startIndex:finalIndex]) if (startIndex and finalIndex) else None
# Here some list for free lists for your future unitary tests ;)
l_orignal = ['Some long text here', 'often begins', ' with ',
'impenetrable fog ', 'which ends', ' somewhere further']
l_start_with_ends = ['ends', 'often begins', ' with ',
'impenetrable fog ', 'which ends', 'begins']
l_none = ['random', 'word']
l_without_begin = ['fog', 'ends here']
l_without_end = ['begins', 'but never' '...']
print(concat_with_trigger_words(l_orignal)) # often begins with impenetrable fog
print(concat_with_trigger_words(l_start_with_ends)) # often begins with impenetrable fog
print(concat_with_trigger_words(l_none)) # None
print(concat_with_trigger_words(l_without_end)) # None
print(concat_with_trigger_words(l_without_begin)) # None
答案 2 :(得分:1)
>>> l = ['Some long text', 'often begins', ' with ',
... 'impenetrable fog ', 'which ends', ' somewhere further']
>>> start, end = 'begins', 'ends'
>>> key_index = {'start': {'word': start, 'index': -1},
'end': {'word': end, 'index': -1}}
>>> for i, val in enumerate(l):
... if key_index['start']['word'] in val:
... key_index['start']['index'] = i
... elif key_index['end']['word'] in val:
... key_index['end']['index'] = i
...
>>> start_index, end_index = key_index['start']['index'], key_index['end']['index']
>>> my_list = l[start_index+1:end_index] if start_index >=0 and end_index >= 0 and start_index+1 < end_index else None
>>> my_list
[' with ', 'impenetrable fog ']