我正在寻找一种从最长的重复模式中清除字符串的方法。
我有大约1000个网页标题的列表,它们都有一个共同的后缀,即网站的名称。
他们遵循这种模式:
['art gallery - museum and visits | expand knowledge',
'lasergame - entertainment | expand knowledge',
'coffee shop - confort and food | expand knowledge',
...
]
如何自动从常用后缀" | expand knowledge"
中删除所有字符串?
谢谢!
编辑:抱歉,我没有说清楚自己。
我事先没有关于" | expand knowledge"
后缀的信息。
我希望能够清除潜在共同后缀的字符串列表,即使我不知道它是什么。
答案 0 :(得分:4)
以下是使用反向标题上的os.path.commonprefix
函数的解决方案:
titles = ['art gallery - museum and visits | expand knowledge',
'lasergame - entertainment | expand knowledge',
'coffee shop - confort and food | expand knowledge',
]
# Find the longest common suffix by reversing the strings and using a
# library function to find the common "prefix".
common_suffix = os.path.commonprefix([title[::-1] for title in titles])[::-1]
# Strips all titles from the number of characters in the common suffix.
stripped_titles = [title[:-len(common_suffix)] for title in titles]
结果:
['艺术画廊 - 博物馆和参观','lasergame - 娱乐', '咖啡店 - 舒适和食物']
因为它自己找到了共同的后缀,所以它应该适用于任何一组标题,即使你不知道后缀。
答案 1 :(得分:1)
如果你确定所有的字符串都有共同的后缀,那么这就可以了:
strings = [
'art gallery - museum and visits | expand knowledge',
'lasergame - entertainment | expand knowledge']
suffixlen = len(" | expand knowledge")
print [s[:-suffixlen] for s in strings]
输出:
['art gallery - museum and visits', 'lasergame - entertainment']
答案 2 :(得分:0)
如果您确实知道要删除的后缀,则可以执行以下操作:
suffix = " | expand knowledge"
your_list = ['art gallery - museum and visits | expand knowledge',
'lasergame - entertainment | expand knowledge',
'coffee shop - confort and food | expand knowledge',
...]
new_list = [name.rstrip(suffix) for name in your_list]