Python - 获取句子中最常用的单词,如果有平局则返回按字母顺序首先出现的单词

时间:2017-09-21 08:44:19

标签: python-2.7

我在下面写了以下代码。它的工作没有错误,我面临的问题是,如果一个句子中有2个单词重复相同的次数,则代码不会按字母顺序返回第一个单词。任何人都可以建议任何替代品?这段代码将在Python 2.7中进行评估。

"""Quiz: Most Frequent Word"""

def most_frequent(s):
    """Return the most frequently occuring word in s."""

    """ Step 1 - The following assumptions have been made:
        - Space is the default delimiter
        - There are no other punctuation marks that need removing
        - Convert all letters into lower case"""


    word_list_array = s.split()


    """Step 2 - sort the list alphabetically"""

    word_sort = sorted(word_list_array, key=str.lower)

    """Step 3 - count the number of times word has been repeated in the word_sort array.
                create another array containing the word and the frequency in which it is repeated"""

    wordfreq = []
    freq_wordsort = []
    for w in word_sort:
        wordfreq.append(word_sort.count(w))
        freq_wordsort = zip(wordfreq, word_sort)


    """Step 4 - output the array having the maximum first index variable and output the word in that array"""

    max_word = max(freq_wordsort)
    word = max_word[-1]


    result = word

    return result


def test_run():
    """Test most_frequent() with some inputs."""
    print most_frequent("london bridge is falling down falling down falling down london bridge is falling down my fair lady") # output: 'bridge'
    print most_frequent("betty bought a bit of butter but the butter was bitter") # output: 'butter'


if __name__ == '__main__':
    test_run()

1 个答案:

答案 0 :(得分:0)

如果不对代码进行过多处理,我发现通过使用index方法可以实现一个好的解决方案。

找到频率最高的 max_word )后,只需在 wordfreq 上调用index方法即可 max_word 作为输入,返回其在列表中的位置;然后在 word_sort 中返回与此索引关联的单词。

代码示例如下(我删除了zip函数,因为它不再需要了,并添加了两个更简单的示例):

"""Quiz: Most Frequent Word"""



def most_frequent(s):
    """Return the most frequently occuring word in s."""

    """ Step 1 - The following assumptions have been made:
        - Space is the default delimiter
        - There are no other punctuation marks that need removing
        - Convert all letters into lower case"""


    word_list_array = s.split()


    """Step 2 - sort the list alphabetically"""

    word_sort = sorted(word_list_array, key=str.lower)

    """Step 3 - count the number of times word has been repeated in the word_sort array.
                create another array containing the word and the frequency in which it is repeated"""

    wordfreq = []
    # freq_wordsort = []
    for w in word_sort:
        wordfreq.append(word_sort.count(w))
        # freq_wordsort = zip(wordfreq, word_sort)


    """Step 4 - output the array having the maximum first index variable and output the word in that array"""

    max_word = max(wordfreq)
    word = word_sort[wordfreq.index(max_word)] # <--- solution!


    result = word

    return result


def test_run():
    """Test most_frequent() with some inputs."""
    print(most_frequent("london bridge is falling down falling down falling down london bridge is falling down my fair lady")) # output: 'down'
    print(most_frequent("betty bought a bit of butter but the butter was bitter")) # output: 'butter'
    print(most_frequent("a a a a b b b b")) #output: 'a'
    print(most_frequent("z z j j z j z j")) #output: 'j'


if __name__ == '__main__':
    test_run()