我试图将句子列表分成字符串:
user = ['The', 'Macbeth', 'Tragedie'] #this list
plays = []
hamlet = gutenberg.sents('shakespeare-hamlet.txt')
macbeth = gutenberg.sents('shakespeare-macbeth.txt')
caesar = gutenberg.sents('shakespeare-caesar.txt')
plays.append(hamlet)
plays.append(macbeth)
plays.append(caesar)
shakespeare = list(chain.from_iterable(plays)) # with this list
'莎士比亚'打印如下:
[['[', 'The', 'Tragedie', 'of', 'Hamlet', 'by', 'William', 'Shakespeare', '1599', ']'], ['Actus', 'Primus', '.'], ['Scoena', 'Prima', '.'], ['Enter', 'Barnardo', 'and', 'Francisco', 'two', 'Centinels', '.']...['FINIS', '.'], ['THE', 'TRAGEDIE', 'OF', 'IVLIVS', 'CaeSAR', '.']]
bestCount = 0
for sent in shakespeare:
currentCount = len(set(user).intersection(sent))
if currentCount > bestCount:
bestCount = currentCount
answer = ' '.join(sent)
return ''.join(answer).lower(), bestCount
然而, return
不正确,即"哈姆雷特"与" macbeth" ...
('the tragedie of hamlet , prince of denmarke .', 3)
错误在哪里?
答案 0 :(得分:0)
听起来你不应该在这里使用套装。最明显的问题是你关心句子中一个单词的出现次数(以列表的形式开始),并且通过转换为一个集合,你将所有重复的单词折叠为一次,丢失该信息。
我建议将每个句子的成员转换成小写,如下所示:
mapped = map(str.lower, sentence) # may want list(map(...)) if on Py3
初始化这样的计数字典:
In [6]: counts = {word.lower(): 0 for word in user}
In [7]: counts
Out[7]: {'macbeth': 0, 'the': 0, 'tragedie': 0}
然后当你循环句子时,你可以这样做:
In [8]: for word in counts:
...: counts[word] = max(counts[word], mapped.count(word))
...:
In [9]: counts
Out[9]: {'macbeth': 0, 'the': 1, 'tragedie': 1}
我只使用了一个例句,但你明白了。最后,您将获得用户单词出现在句子中的最大次数。您可以使数据结构稍微复杂一些,或者如果您想要保留最常出现的句子,也可以使用if语句测试。
祝你好运!