Question

我正在与OCR合作开展一个项目。经过一些操作后，我有两个字符串：

s1 = "This text is a test of"
s2 = "a test of the reading device"

我想知道如何删除第二个字符串的重复单词。我的想法是找到每个列表中重复的单词的位置。我试过这个：

e1 = [x for x in s1.split()]
e2 = [y for y in s2.split()]

for i, item2 in enumerate(e2):
    if item2 in e1:
        print i, item2 #repeated word and index in the first string
        print e1.index(item2) #index in the second string

现在我在第一和第二个列表中有重复的单词及其位置。如果这些是相同的顺序，我需要它来逐字比较。这可能发生在字符串中出现两次或多次相同的单词（将来的验证）。

最后我希望得到一个类似的最终字符串：

ns2 = "the reading device"    
sf= "This text is a test of the reading device"

我在Windows 7上使用python 2.7。

Answer 1

这是另一种尝试，

from difflib import SequenceMatcher as sq
match = sq(None, s1, s2).find_longest_match(0, len(s1), 0, len(s2))

<强>结果

print s1 + s2[match.b+match.size:]

本文是对阅读设备的测试

Answer 2

也许这个？
' '.join([x for x in s1.split(' ')] + [y for y in s2.split(' ') if y not in s1.split(' ')]) 我没有仔细测试过，但这可能是处理这类需求的好主意。

如何删除python中两个字符串之间的重复单词？

2 个答案: