从不在基本字符串中的字符串列表中提取单个单词或短语

时间:2016-03-11 12:27:56

标签: python python-3.x

我想在Python中构建一个脚本,它接受一个基本字符串并通过其他字符串列表运行它。该脚本应返回字符串中但不在基本字符串中的单词或短语列表。

示例:

string = 'why kid is upset'

list_of_strings = ['why my kid is upset', 'why beautiful kid is upset',
                   'why my 15 years old kid is upset', 'why my kid is always upset']

应该返回

['my', 'beautiful', 'my 15 years old', 'always']

您建议我学习哪些图书馆来解决问题?

3 个答案:

答案 0 :(得分:0)

您不需要特殊的库。就这样做:

def get_list(string, list_of_strings):
    split_list = string.split()
    return [" ".join(filter(lambda s: s not in split_list, string.split())) for string in list_of_strings)]

这可能有点难以阅读,所以你可以把它分开:

def get_list(string, list_of_strings):
    split_list = string.split()
    new_list = []
    for string in list_of_strings:
        unseen_words = filter(lambda s: s not in split_list, string.split())
        unseen_sentence = " ".join(unseen_words)
        new_list.append(unseen_sentence)
    return new_list

答案 1 :(得分:0)

<强>更新

此版本将所有已见过的单词添加到exclude集:

exclude = set('why kid is upset'.split())
list_of_strings = ['why my kid is upset', 
                   'why beautiful kid is upset', 
                   'why my 15 years old kid is upset',
                   'why my kid is always upset']
res = []
for item in list_of_strings:
    words = item.split()
    res.append(' '.join(word for word in words if word not in exclude))
    exclude.update(set(words))
print(res)

结果:

['my', 'beautiful', '15 years old', 'always']

这样可行:

exclude = set('why kid is upset'.split())
list_of_strings = ['why my kid is upset', 
                   'why beautiful kid is upset', 
                   'why my 15 years old kid is upset',
                   'why my kid is always upset']
>>> [' '.join(word for word in item.split() if word not in exclude) for item
     in list_of_strings]
['my', 'beautiful', 'my 15 years old', 'my always']

答案 2 :(得分:0)

当你在字符串列表中有这样的格式时,我不确定你需要的格式:'为什么我15岁的孩子现在心烦意乱'

无论如何,我没有lib指出,这个小代码似乎解决了你的问题:

def stringNOTinbase(base,los):
    basewords = set(base.split(" ") )
    res = []
    for string in los:
        res.append( " ".join( [word for word in string.split(" ") if word not in basewords  ]   )   )
    return res

如果您定义变量并按以下方式调用它:

string = 'why kid is upset'

list_of_strings = ['why my kid is upset', 'why beautiful kid is upset', 'why my 15 years old kid is upset', 'why my kid is always upset','why my 15 years old kid is upset now']

print stringNOTinbase(string,list_of_strings)

通话将输出:

['my', 'beautiful', 'my 15 years old', 'my always', 'my 15 years old now']

说明:我取基本字符串并创建一个“set”拆分它; 然后将列表中的每个字符串拆分为单词,并将不在集合中的单词添加到新列表中,然后再使用空格连接。

我希望它有所帮助