我试图从字符串列表中的每个元素中删除子字符串。我无法弄清楚如何处理具有多个我想删除的子串(停用词)的字符串的情况。
wines = ("2008 Chardonnay", "Cabernet Sauvignon 2009", "Bordeaux 2005 Cotes du Rhone")
stop_words = ("2005", "2008", "2009", "Cotes du Rhone")
result = []
for wine in wines:
for stop in stop_words:
if stop in wine:
x = wine.replace(stop, "")
result.append(x)
print result
将if语句更改为for或while会返回垃圾或挂起。有什么建议吗?
答案 0 :(得分:3)
稍微缩进并改变变量可以解决您的问题
for wine in wines:
glass=wine #Lets pour your wine in a glass
for stop in stop_words:
if stop in glass: #Is stop in your glass?
#Replace stop in glass and pour it in the glass again
glass = glass.replace(stop, "")
result.append(glass) #Finally pour the content from your glass to result
result
[' Chardonnay', 'Cabernet Sauvignon ', 'Bordeaux ']
如果您想冒险,可以使用正则表达式。我相信在这种情况下,正则表达式可能比简单循环更快
>>> for wine in wines:
result.append(re.sub('('+'|'.join(stop_words)+')','',wine))
>>> result
[' Chardonnay', 'Cabernet Sauvignon ', 'Bordeaux ']
>>>
或者将其作为列表理解
>>> [re.sub('('+'|'.join(stop_words)+')','',wine) for wine in wines]
[' Chardonnay', 'Cabernet Sauvignon ', 'Bordeaux ']
>>>
答案 1 :(得分:1)
wines = ("2008 Chardonnay", "Cabernet Sauvignon 2009", "Bordeaux 2005 Cotes du Rhone")
stop_words = ("2005", "2008", "2009", "Cotes du Rhone")
result = []
for wine in wines:
x = wine
for stop in stop_words:
x = x.replace(stop, "")
result.append(x)
print result
使用regex
会更好IMO
>>> wines = ("2008 Chardonnay", "Cabernet Sauvignon 2009", "Bordeaux 2005 Cotes du Rhone")
>>> stop_words = ("2005", "2008", "2009", "Cotes du Rhone")
>>> import re
>>> [re.sub('|'.join(stop_words),'',wine) for wine in wines]
[' Chardonnay', 'Cabernet Sauvignon ', 'Bordeaux ']
答案 2 :(得分:0)
作为一个单行,考虑到jamylaks建议使用strip()
:
[reduce(lambda x,y: x.replace(y, "").strip(), stop_words, wine) for wine in wines]
请注意,这在Python 2.x中运行良好,但在Python 3中运行不正常,因为reduce()
已移至单独的库中。如果您使用的是Python 3,请执行以下操作:
import functools as ft
[ft.reduce(lambda x,y: x.replace(y, "").strip(), stop_words, wine) for wine in wines]