请帮助我。
到目前为止,我已经完成了步骤1(请参见下面的代码)。效果很好:
stopwords=['what','hello','and','at','is','am','i']
search_list=['where is north and northern side',
'ask in the community at western environmental',
'my name is alan and i am coming from london southeast']
dictionary = {'n': ['north','northern'],
's': ['south','southern'],
'e': ['east','eastern'],
'w': ['west','western'],
'env': ['environ.','enviornment','environmental']}
result = [' '.join(w for w in place.split() if w.lower() not in stopwords)
for place in search_list]
print (result)
我需要以下理想的最终输出来完成第2步。为了获得所需的最终输出,我应该在上面的代码行中更改/包括哪些内容?也欢迎使用其他任何替代方法。
['where n n side', 'ask in the community w env', 'my name alan coming from london s']
答案 0 :(得分:3)
您必须“反转”字典,因为查找是相反的方式:
rev_dict = {v:k for k,l in dictionary.items() for v in l}
现在方便更换:
>>> rev_dict
{'east': 'e',
'eastern': 'e',
'enviornment': 'env',
'environ.': 'env',
'environmental': 'env',
'north': 'n',
'northern': 'n',
'south': 's',
'southern': 's',
'west': 'w',
'western': 'w'}
再次分割您的字符串(为了避免分割,您可以保留单词列表,并在不匹配的情况下将其替换为默认值)
result = [" ".join([rev_dict.get(x,x) for x in s.split() if x not in stopwords]) for s in search_list]
或结合停用词和替代词
:stopwords={'what','hello','and','at','is','am','i'} # define as a set for fast lookup
result = [" ".join([rev_dict.get(x,x) for x in s.split() if x not in stopwords]) for s in search_list]
在两种情况下,结果:
['where n n side', 'ask in the community w env', 'my name alan coming from london southeast']