我有一个句子列表listOfSentences
,如下所示:
listOfSentences = ['mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
我还有keywords
字典,如下所示:
keyWords= {('bam', 3), ('lamb', 2), ('ate', 1)}
其中单词的频率越高,keyWords
中的密钥越小。
>>> print(keySentences)
>>> ['bam bam bam she also loves ham.', 'she ate the lamb.',]
我的问题是:如何将keyWords
中的元素与listOfSentences
中的元素进行比较,以便输出列表keySentences
答案 0 :(得分:1)
keyWords
如果是字典则更有用,那么它就是一个简单的字典查找,以获得每个单词的分数。可以使用split()
提取每个单词。
这是一些代码。这假定标点符号是单词的一部分(如示例结果列表keySentences
所示):
listOfSentences = ['mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = dict(keyWords)
keySentences = []
for sentence in listOfSentences:
score = sum(keyWords.get(word, 0) for word in sentence.split())
if score > 0:
keySentences.append((score, sentence))
keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)]
print(keySentences)
<强>输出强>
['bam bam bam she also loves ham.', 'she ate the lamb.']
如果您想忽略标点符号,可以在处理之前将其从每个句子中删除:
import string
# mapping to remove punctuation with str.translate()
remove_punctuation = {ord(c): None for c in string.punctuation}
listOfSentences = ['mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
keyWords= [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = dict(keyWords)
keySentences = []
for sentence in listOfSentences:
score = sum(keyWords.get(word, 0) for word in sentence.translate(remove_punctuation).split())
if score > 0:
keySentences.append((score, sentence))
keySentences = [sentence for score, sentence in sorted(keySentences, reverse=True)]
print(keySentences)
<强>输出强>
['bam bam bam she also loves ham.', 'she ate the lamb.', 'mary had a little lamb.']
现在结果列表中还包含&#34;玛丽有一只小羊羔。&#34;因为满满的尾随&#34;羊羔&#34;被str.translate()
删除。
答案 1 :(得分:1)
以下内容将根据匹配的字数对您的句子进行评分:
import re
keyWords = [('bam', 3), ('lamb', 2), ('ate', 1)]
keyWords = [w for w, c in keyWords] # only need the words
listOfSentences = [
'mary had a little lamb.',
'she also had a little pram.',
'bam bam bam she also loves ham.',
'she ate the lamb.']
words = [re.findall(r'(\w+)', s) for s in listOfSentences]
keySentences = []
for word_list, sentence in zip(words, listOfSentences):
keySentences.append((len([word for word in word_list if word in keyWords]), sentence))
for count, sentence in sorted(keySentences, reverse=True):
print '{:2} {}'.format(count, sentence)
给你以下输出:
3 bam bam bam she also loves ham.
2 she ate the lamb.
1 mary had a little lamb.
0 she also had a little pram
答案 2 :(得分:0)
尝试这样:
>>> [x for x in listOfSentences for i in keyWords if x.count(i[0])==i[1]]
['bam bam bam she also loves ham.', 'she ate the lamb.']