Question

我有一个单词短语列表和一个字符串如下。

mylist = ['and rock, 'shake well', 'the']
mystring = "the sand rock need to be mixed and shake well"

我想将mylist中的字词替换为""。

我目前在python中使用replace方法如下。

for item in mylist:
        mystring = mystring.replace(item, "")

但是，我注意到它对我的所有句子都不适用。例如，在mystring中，它与sand rock进行了假匹配，输出如下。

  s  need to be mixed and

不管怎样，我希望它成为;

sand rock need to be mixed and

在python中有更好的方法吗？

Answer 1

问题是str.replace()不允许您指定您只想匹配整个单词（或短语）。 re模块允许您使用正则表达式（正则表达式）进行模式匹配。使用正则表达式，您可以使用\b转义指定单词边界。在短语之前和之后放置\b转义符，以使匹配仅发生在单词边界处。 re.sub()函数与str.replace()方法类似，您可以在代码中使用它，如：

import re
mylist = ['and rock', 'shake well', 'the']
mystring = "the sand rock need to be mixed and shake well"
for item in mylist:
        mystring = re.sub(r"\b{}\b".format(item), "", mystring)        
print(mystring)

Out[6]: ' sand rock need to be mixed and '

Answer 2

您的问题的部分原因是您不想匹配部分单词。这就是为什么replace()方法无法按照您的意愿执行操作的原因。您可以通过正则表达式实现所需。关于RE的一个好处是你可以使用\b标志匹配字边界。

Answer 3

使用re.sub并应用\b（字边界）来匹配完整的字符串

import re    
re.sub('\b'+'|'.join(mylist), '', mystring)
#' sand rock need to be mixed and '

在python中识别字符串中的某些单词短语

3 个答案: