比较列表中的子元素与另一个元素

时间:2015-10-13 11:28:15

标签: python string list comparison nlp

我有一个句子列表listOfSentences,如下所示:

listOfSentences = ['mary had a little lamb.', 
                   'she also had a little pram.',
                   'bam bam bam she also loves ham.', 
                   'she ate the lamb.']

我还有keywords字典,如下所示:

keyWords= {('bam', 3), ('lamb', 2), ('ate', 1)}

其中单词的频率越高,keyWords中的密钥越小。

>>> print(keySentences)
>>> ['bam bam bam she also loves ham.', 'she ate the lamb.',] 

我的问题是:如何将keyWords中的元素与listOfSentences中的元素进行比较,以便输出列表keySentences

3 个答案:

答案 0 :(得分:1)

keyWords如果是字典则更有用,那么它就是一个简单的字典查找,以获得每个单词的分数。可以使用split()提取每个单词。

这是一些代码。这假定标点符号是单词的一部分(如示例结果列表keySentences所示):

listOfSentences = ['mary had a little lamb.', 
                   'she also had a little pram.',
                   'bam bam bam she also loves ham.', 
                   'she ate the lamb.']

keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = dict(keyWords)

keySentences = []
for sentence in listOfSentences:
    score = sum(keyWords.get(word, 0) for word in sentence.split())
    if score > 0:
        keySentences.append((score, sentence))

keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)]
print(keySentences)

<强>输出

['bam bam bam she also loves ham.', 'she ate the lamb.']

如果您想忽略标点符号,可以在处理之前将其从每个句子中删除:

import string

# mapping to remove punctuation with str.translate()
remove_punctuation = {ord(c): None for c in string.punctuation}

listOfSentences = ['mary had a little lamb.', 
                   'she also had a little pram.',
                   'bam bam bam she also loves ham.', 
                   'she ate the lamb.']

keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = dict(keyWords)

keySentences = []
for sentence in listOfSentences:
    score = sum(keyWords.get(word, 0) for word in sentence.translate(remove_punctuation).split())
    if score > 0:
        keySentences.append((score, sentence))

keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)]
print(keySentences)

<强>输出

['bam bam bam she also loves ham.', 'she ate the lamb.', 'mary had a little lamb.']

现在结果列表中还包含&#34;玛丽有一只小羊羔。&#34;因为满满的尾随&#34;羊羔&#34;被str.translate()删除。

答案 1 :(得分:1)

以下内容将根据匹配的字数对您的句子进行评分:

import re

keyWords = [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = [w for w, c in keyWords]     # only need the words

listOfSentences = [
    'mary had a little lamb.', 
    'she also had a little pram.',
    'bam bam bam she also loves ham.', 
    'she ate the lamb.']    

words = [re.findall(r'(\w+)', s) for s in listOfSentences]
keySentences = []

for word_list, sentence in zip(words, listOfSentences):
    keySentences.append((len([word for word in word_list if word in keyWords]), sentence))

for count, sentence in sorted(keySentences, reverse=True):
    print '{:2}  {}'.format(count, sentence)

给你以下输出:

 3  bam bam bam she also loves ham.
 2  she ate the lamb.
 1  mary had a little lamb.
 0  she also had a little pram

答案 2 :(得分:0)

尝试这样:

>>> [x for x in listOfSentences for i in keyWords if x.count(i[0])==i[1]]
['bam bam bam she also loves ham.', 'she ate the lamb.']