Question

我有一个句子列表listOfSentences，如下所示：

listOfSentences = ['mary had a little lamb.', 
                   'she also had a little pram.',
                   'bam bam bam she also loves ham.', 
                   'she ate the lamb.']

我还有keywords字典，如下所示：

keyWords= {('bam', 3), ('lamb', 2), ('ate', 1)}

其中单词的频率越高，keyWords中的密钥越小。

>>> print(keySentences)
>>> ['bam bam bam she also loves ham.', 'she ate the lamb.',]

我的问题是：如何将keyWords中的元素与listOfSentences中的元素进行比较，以便输出列表keySentences

Answer 1

keyWords如果是字典则更有用，那么它就是一个简单的字典查找，以获得每个单词的分数。可以使用split()提取每个单词。

这是一些代码。这假定标点符号是单词的一部分（如示例结果列表keySentences所示）：

listOfSentences = ['mary had a little lamb.', 
                   'she also had a little pram.',
                   'bam bam bam she also loves ham.', 
                   'she ate the lamb.']

keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = dict(keyWords)

keySentences = []
for sentence in listOfSentences:
    score = sum(keyWords.get(word, 0) for word in sentence.split())
    if score > 0:
        keySentences.append((score, sentence))

keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)]
print(keySentences)

<强>输出

['bam bam bam she also loves ham.', 'she ate the lamb.']

如果您想忽略标点符号，可以在处理之前将其从每个句子中删除：

import string

# mapping to remove punctuation with str.translate()
remove_punctuation = {ord(c): None for c in string.punctuation}

listOfSentences = ['mary had a little lamb.', 
                   'she also had a little pram.',
                   'bam bam bam she also loves ham.', 
                   'she ate the lamb.']

keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = dict(keyWords)

keySentences = []
for sentence in listOfSentences:
    score = sum(keyWords.get(word, 0) for word in sentence.translate(remove_punctuation).split())
    if score > 0:
        keySentences.append((score, sentence))

keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)]
print(keySentences)

<强>输出

['bam bam bam she also loves ham.', 'she ate the lamb.', 'mary had a little lamb.']

现在结果列表中还包含＆＃34;玛丽有一只小羊羔。＆＃34;因为满满的尾随＆＃34;羊羔＆＃34;被str.translate()删除。

Answer 2

以下内容将根据匹配的字数对您的句子进行评分：

import re

keyWords = [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = [w for w, c in keyWords]     # only need the words

listOfSentences = [
    'mary had a little lamb.', 
    'she also had a little pram.',
    'bam bam bam she also loves ham.', 
    'she ate the lamb.']    

words = [re.findall(r'(\w+)', s) for s in listOfSentences]
keySentences = []

for word_list, sentence in zip(words, listOfSentences):
    keySentences.append((len([word for word in word_list if word in keyWords]), sentence))

for count, sentence in sorted(keySentences, reverse=True):
    print '{:2}  {}'.format(count, sentence)

给你以下输出：

 3  bam bam bam she also loves ham.
 2  she ate the lamb.
 1  mary had a little lamb.
 0  she also had a little pram

Answer 3

尝试这样：

>>> [x for x in listOfSentences for i in keyWords if x.count(i[0])==i[1]]
['bam bam bam she also loves ham.', 'she ate the lamb.']

比较列表中的子元素与另一个元素

3 个答案: