Question

我有成千上万个以下格式的字符串（非英语）：

['MyWordMyWordSuffix', 'SameVocabularyItemMyWordSuffix']

我想返回以下内容：

['MyWordMyWordSuffix', 'SameVocabularyItem']

由于字符串是不可变的，因此我想从头开始进行匹配，因此我一直困惑于如何进行匹配。

我最好的猜测是某种从字符串末尾开始并不断检查是否匹配的循环。

但是，由于我要处理的字符太多，因此似乎应该比循环遍历所有字符的构建方式更快，但是由于我仍在学习Python，所以我还不了解（到目前为止）

可以在here上找到我在SO上已经找到的最接近的示例，但这并不是我真正想要的。

谢谢您的帮助！

Answer 1

您可以使用commonprefix from os.path查找它们之间的通用后缀：

from os.path import commonprefix

def getCommonSuffix(words):
    # get common suffix by reversing both words and finding the common prefix
    prefix = commonprefix([word[::-1] for word in words])
    return prefix[::-1]

然后可以用来从列表的第二个字符串中切出后缀：

word_list = ['MyWordMyWordSuffix', 'SameVocabularyItemMyWordSuffix']

suffix = getCommonSuffix(word_list)
if suffix:
    print("Found common suffix:", suffix)

    # filter out suffix from second word in the list
    word_list[1] = word_list[1][0:-len(suffix)]
    print("Filtered word list:", word_list)
else:
    print("No common suffix found")

输出：

Found common suffix: MyWordSuffix
Filtered word list: ['MyWordMyWordSuffix', 'SameVocabularyItem']

演示：https://repl.it/@glhr/55705902-common-suffix

如何从字符串的末尾删除/删除与字符串的另一端匹配的字符

1 个答案: