删除单词列表并替换

时间:2018-08-11 17:37:23

标签: python list dictionary replace split

请帮助我。

  1. 我有一个停用词列表,还有一个搜索列表。我要删除那些 从搜索列表中停用单词。
  2. (步骤1)之后,我想将每个拆分单词与“字典值”进行匹配。如果 值匹配用相应的字典键替换特定单词 然后加入其他单词。

到目前为止,我已经完成了步骤1(请参见下面的代码)。效果很好:

    stopwords=['what','hello','and','at','is','am','i']
    search_list=['where is north and northern side',
                 'ask in the community at western environmental',
                 'my name is alan and i am coming from london southeast']
    dictionary = {'n': ['north','northern'],
                  's': ['south','southern'],
                  'e': ['east','eastern'],
                  'w': ['west','western'],
                  'env': ['environ.','enviornment','environmental']}

    result = [' '.join(w for w in place.split() if w.lower() not in stopwords)
                for place in search_list]

    print (result)

我需要以下理想的最终输出来完成第2步。为了获得所需的最终输出,我应该在上面的代码行中更改/包括哪些内容?也欢迎使用其他任何替代方法。

['where n n side', 'ask in the community w env', 'my name alan coming from london s']

1 个答案:

答案 0 :(得分:3)

您必须“反转”字典,因为查找是相反的方式:

rev_dict = {v:k for k,l in dictionary.items() for v in l}

现在方便更换:

>>> rev_dict
{'east': 'e',
 'eastern': 'e',
 'enviornment': 'env',
 'environ.': 'env',
 'environmental': 'env',
 'north': 'n',
 'northern': 'n',
 'south': 's',
 'southern': 's',
 'west': 'w',
 'western': 'w'}

再次分割您的字符串(为了避免分割,您可以保留单词列表,并在不匹配的情况下将其替换为默认值)

result = [" ".join([rev_dict.get(x,x) for x in s.split() if x not in stopwords]) for s in search_list]

或结合停用词和替代词

stopwords={'what','hello','and','at','is','am','i'}  # define as a set for fast lookup
result = [" ".join([rev_dict.get(x,x) for x in s.split() if x not in stopwords]) for s in search_list]

在两种情况下,结果:

['where n n side', 'ask in the community w env', 'my name alan coming from london southeast']