Question

我要删除所有包含特定子字符串的单词。

Sentence = 'walking my dog https://github.com/'
substring = 'http'

# Remove all words that start with the substring
#...

result = 'walking my dog'

Answer 1

这会尊重字符串中的原始间距，而不必花费太多时间。

import re
string = "a suspect http://string.com   with spaces before and after"
starts = "http"
re.sub(f"\\b{starts}[^ ]*[ ]+", "", string)
'a suspect with spaces before and after'

Answer 2

我们可以使用一种简单的方法。

将sentence分解为单词
找到所有的作品
检查该单词是否包含substring并将其删除
重新加入剩下的单词。

>>> sentence = 'walking my dog https://github.com/'
>>> substring = 'http'
>>> f = lambda v, w: ' '.join(filter(lambda x: w not in x, v.split(' ')))
>>> f(sentence, substring)
'walking my dog'

说明：

1. ' '.join(
2.   filter(
3.     lambda x: w not in x,
4.     v.split(' ')   
6.  )
7. )

1以加入星标。 2用于过滤4中的所有元素，从而将字符串拆分为单词。要过滤的条件是substring not in word。 not in进行了O(len(substring) * len(word))复杂度比较。

注意：唯一可以加快的步骤是第3行。您正在将单词与常量字符串进行比较，因此可以使用Rabin-Karp String Matching在O(len(word))中查找字符串，或者使用Z-Function在O(len(word) + len(substring))中查找字符串

删除以子字符串开头的字符串

2 个答案: