我有两个长度相等的字符串,想要匹配具有相同索引的单词。我也试图匹配连续比赛,这是我遇到麻烦的地方。
例如我有两个字符串
alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'
我要找的是得到结果:
['I am','show']
我目前的代码如下:
keys = []
for x in alligned1.split():
for i in alligned2.split():
if x == i:
keys.append(x)
这给了我:
['I','am','show']
任何指导或帮助都将不胜感激。
答案 0 :(得分:10)
查找匹配的单词非常简单,但将它们放在连续的组中相当棘手。我建议使用groupby
。
import itertools
alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'
results = []
word_pairs = zip(alligned1.split(), alligned2.split())
for k, v in itertools.groupby(word_pairs, key = lambda pair: pair[0] == pair[1]):
if k:
words = [pair[0] for pair in v]
results.append(" ".join(words))
print results
结果:
['I am', 'show']
答案 1 :(得分:3)
您的代码的简化将是:
alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'
keys = []
for i, word in enumerate(alligned1.split()):
if word == alligned2.split()[i]:
keys.append(word)
然后我们需要跟踪我们是否只匹配一个单词,让我们用一个标志变量来做。
alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'
keys = []
prev = ''
for i, word in enumerate(alligned1.split()):
if word == alligned2.split()[i]:
prev = prev + ' ' + word if prev else word
elif prev:
keys.append(prev)
prev = ''
答案 2 :(得分:1)
好Kevin's answer是最好的,也是最好的。我试着用蛮力的方式做到这一点。它看起来不太好,但没有任何导入
alligned1 = 'I am going to go to some show'.split(' ')
alligned2 = 'I am not going to go the show'.split(' ')
keys = []
temp = [v if v==alligned1[i] else None for i,v in enumerate(alligned2) ]
temp.append(None)
tmpstr = ''
for i in temp:
if i:
tmpstr+=i+' '
else:
if tmpstr: keys.append(tmpstr)
tmpstr = ''
keys = [i.strip() for i in keys]
print keys
输出
['I am', 'show']
答案 3 :(得分:0)
也许不是很优雅,但它有效:
from itertools import izip_longest
alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'
curr_match = ''
matches = []
for w1, w2 in izip_longest(alligned1.split(), alligned2.split()):
if w1 != w2:
if curr_match:
matches.append(curr_match)
curr_match = ''
continue
if curr_match:
curr_match += ' '
curr_match += w1
if curr_match:
matches.append(curr_match)
print matches
结果:
['I am', 'show']