Python ::相交的字符串列表出错了

时间:2016-06-09 19:10:30

标签: python list

我试图将句子列表分成字符串:

    user = ['The', 'Macbeth', 'Tragedie'] #this list
    plays = []

    hamlet = gutenberg.sents('shakespeare-hamlet.txt')
    macbeth = gutenberg.sents('shakespeare-macbeth.txt')
    caesar = gutenberg.sents('shakespeare-caesar.txt')
    plays.append(hamlet)
    plays.append(macbeth)
    plays.append(caesar)

    shakespeare = list(chain.from_iterable(plays)) # with this list

'莎士比亚'打印如下:

[['[', 'The', 'Tragedie', 'of', 'Hamlet', 'by', 'William', 'Shakespeare', '1599', ']'], ['Actus', 'Primus', '.'], ['Scoena', 'Prima', '.'], ['Enter', 'Barnardo', 'and', 'Francisco', 'two', 'Centinels', '.']...['FINIS', '.'], ['THE', 'TRAGEDIE', 'OF', 'IVLIVS', 'CaeSAR', '.']]

    bestCount = 0
    for sent in shakespeare:
        currentCount = len(set(user).intersection(sent))
        if currentCount > bestCount:
            bestCount = currentCount
            answer = ' '.join(sent)
    return ''.join(answer).lower(), bestCount
然而,

return不正确,即"哈姆雷特"与" macbeth" ...

相交

('the tragedie of hamlet , prince of denmarke .', 3)

错误在哪里?

1 个答案:

答案 0 :(得分:0)

听起来你不应该在这里使用套装。最明显的问题是你关心句子中一个单词的出现次数(以列表的形式开始),并且通过转换为一个集合,你将所有重复的单词折叠为一次,丢失该信息。

我建议将每个句子的成员转换成小写,如下所示:

mapped = map(str.lower, sentence)  # may want list(map(...)) if on Py3

初始化这样的计数字典:

In [6]: counts = {word.lower(): 0 for word in user}

In [7]: counts
Out[7]: {'macbeth': 0, 'the': 0, 'tragedie': 0}

然后当你循环句子时,你可以这样做:

In [8]: for word in counts:
   ...:     counts[word] = max(counts[word], mapped.count(word))
   ...:

In [9]: counts
Out[9]: {'macbeth': 0, 'the': 1, 'tragedie': 1}

我只使用了一个例句,但你明白了。最后,您将获得用户单词出现在句子中的最大次数。您可以使数据结构稍微复杂一些,或者如果您想要保留最常出现的句子,也可以使用if语句测试。

祝你好运!