Python匹配字符串中具有相同索引的单词

时间:2015-04-21 15:15:07

标签: python string matching

我有两个长度相等的字符串,想要匹配具有相同索引的单词。我也试图匹配连续比赛,这是我遇到麻烦的地方。

例如我有两个字符串

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

我要找的是得到结果:

['I am','show']

我目前的代码如下:

keys = []
for x in alligned1.split():
    for i in alligned2.split():
        if x == i:
            keys.append(x)

这给了我:

['I','am','show']

任何指导或帮助都将不胜感激。

4 个答案:

答案 0 :(得分:10)

查找匹配的单词非常简单,但将它们放在连续的组中相当棘手。我建议使用groupby

import itertools

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

results = []
word_pairs = zip(alligned1.split(), alligned2.split())
for k, v in itertools.groupby(word_pairs, key = lambda pair: pair[0] == pair[1]):
    if k: 
        words = [pair[0] for pair in v]
        results.append(" ".join(words))

print results

结果:

['I am', 'show']

答案 1 :(得分:3)

您的代码的简化将是:

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

keys = []
for i, word in enumerate(alligned1.split()): 
    if word == alligned2.split()[i]:
        keys.append(word)

然后我们需要跟踪我们是否只匹配一个单词,让我们用一个标志变量来做。

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

keys = []
prev = ''
for i, word in enumerate(alligned1.split()): 
    if word == alligned2.split()[i]:
        prev = prev + ' ' + word if prev else word

    elif prev:
        keys.append(prev)
        prev = ''

答案 2 :(得分:1)

Kevin's answer是最好的,也是最好的。我试着用蛮力的方式做到这一点。它看起来不太好,但没有任何导入

alligned1 = 'I am going to go to some show'.split(' ')
alligned2 = 'I am not going to go the show'.split(' ')
keys = []
temp = [v if v==alligned1[i] else None for i,v in enumerate(alligned2) ]
temp.append(None)
tmpstr = ''
for i in temp:
    if i:
        tmpstr+=i+' '
    else:
        if tmpstr: keys.append(tmpstr)
        tmpstr = ''
keys =  [i.strip() for i in keys]
print keys

输出

['I am', 'show']

答案 3 :(得分:0)

也许不是很优雅,但它有效:

from itertools import izip_longest

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

curr_match = ''
matches = []
for w1, w2 in izip_longest(alligned1.split(), alligned2.split()):
    if w1 != w2:
        if curr_match:
            matches.append(curr_match)
            curr_match = ''
        continue
    if curr_match:
        curr_match += ' '
    curr_match += w1
if curr_match:
    matches.append(curr_match)

print matches

结果:

['I am', 'show']