在列表中查找公共子序列

时间:2014-05-26 14:37:49

标签: python nltk

如果我有两个列表,例如

list1 = ['cat', 'sat', 'on', 'mat', 'xx', 'yy'] ,
list2 = ['cow', 'sat', 'on', 'carpet', 'xx', 'yy']

我沿着列表走:当我看到两个匹配的元素时,开始计数。当我看到另一对不匹配的元素时,停止那个计数器并开始另一个。

(sat, sat) I = 1

(on, on) I = 2

(mat, carpet) J = 1

(xx, xx) k = 1

(yy, yy) k = 2

i = 0
for x in list1:
    for y in list2:
        if x == y:
            print (x, y)
            i += 1
        else:
            j = 0
            j += 1
            print (x, y)

2 个答案:

答案 0 :(得分:0)

以下内容如何:

def doit(list1, list2):
    lastmatch = -1
    lastunmatch = -1
    for i, x in enumerate(zip(list1, list2)):
        if x[0] == x[1]:
            lastmatch = i                
        else:
            lastunmatch = i
        print abs(lastmatch - lastunmatch)

正在运行:http://ideone.com/xnJWtz

答案 1 :(得分:0)

>>> from collections import defaultdict
>>>
>>> list1 = ['cat', 'sat', 'on', 'mat', 'xx', 'yy']
>>> list2 = ['cow', 'sat', 'on', 'carpet', 'xx', 'yy']
>>>
>>> var_it = iter('IJKLMNOPQRSTUVWXYZ') # variable candidates
>>> counters = defaultdict(int)
>>> c = next(var_it)
>>> for word1, word2 in zip(list1, list2):
...     if word1 == word2:
...         counters[c] += 1
...     else:
...         if counters: # Prevent counting until first match
...             counters[next(var_it)] = 1
...             c = next(var_it)
...
>>> for var in sorted(counters):
...     print('{}: {}'.format(var, counters[var]))
...
I: 2
J: 1
K: 2