Python:查找显示最多的单词?

时间:2013-07-14 23:57:17

标签: python file text word

我正在尝试让我的程序报告在文本文件中显示最多的单词。例如,如果我输入“你好我喜欢馅饼,因为它们非常好”,程序应该打印出来“就像发生的那样”。执行选项3时出现此错误:KeyError:'h'

#Prompt the user to enter a block of text.
done = False
textInput = ""
while(done == False):
    nextInput= input()
    if nextInput== "EOF":
        break
    else:
        textInput += nextInput

#Prompt the user to select an option from the Text Analyzer Menu.
print("Welcome to the Text Analyzer Menu! Select an option by typing a number"
    "\n1. shortest word"
    "\n2. longest word"
    "\n3. most common word"
    "\n4. left-column secret message!"
    "\n5. fifth-words secret message!"
    "\n6. word count"
    "\n7. quit")

#Set option to 0.
option = 0

#Use the 'while' to keep looping until the user types in Option 7.
while option !=7:
    option = int(input())

#The error occurs in this specific section of the code.
#If the user selects Option 3,
    elif option == 3:
        word_counter = {}
        for word in textInput:
            if word in textInput:
                word_counter[word] += 1
            else:
                word_counter[word] = 1

        print("The word that showed up the most was: ", word)

5 个答案:

答案 0 :(得分:2)

我想你可能想做:

for word in textInput.split():
  ...

目前,您只是遍历textInput中的每个字符。因此,要迭代每个单词,我们必须首先将字符串拆分为一个单词数组。默认情况下,.split()会在空格上进行拆分,但您可以通过将分隔符传递给split()来更改此内容。


此外,您需要检查单词是否在您的词典中,而不是在原始字符串中。所以试试:

if word in word_counter:
  ...

然后,找到出现次数最多的条目:

highest_word = ""
highest_value = 0

for k,v in word_counter.items():
  if v > highest_value:
    highest_value = v
    highest_word = k

然后,只需打印highest_wordhighest_value的值。


要记录关系,请保留最高单词列表。如果我们发现更高的发生率,请清除列表并继续重建。到目前为止,这是完整的程序:

textInput = "He likes eating because he likes eating"
word_counter = {}
for word in textInput.split():
  if word in word_counter:
    word_counter[word] += 1
  else:
    word_counter[word] = 1


highest_words = []
highest_value = 0

for k,v in word_counter.items():
  # if we find a new value, create a new list,
  # add the entry and update the highest value
  if v > highest_value:
    highest_words = []
    highest_words.append(k)
    highest_value = v
  # else if the value is the same, add it
  elif v == highest_value:
    highest_words.append(k)

# print out the highest words
for word in highest_words:
  print word

答案 1 :(得分:2)

不是滚动自己的计数器,更好的办法是在集合模块中使用Counters

>>> input = 'blah and stuff and things and stuff'
>>> from collections import Counter
>>> c = Counter(input.split())
>>> c.most_common()
[('and', 3), ('stuff', 2), ('things', 1), ('blah', 1)]

此外,作为一般代码风格的事情,请避免添加如下评论:

#Set option to 0.
option = 0

它会降低您的代码的可读性,而不是更多。

答案 2 :(得分:1)

原始答案肯定是正确的,但您可能要记住,它不会向您显示“先关系”。像

这样的句子

A life in the present is a present itself.

只会将'a'或'present'显示为头号命中。事实上,由于字典(通常)是无序的,你看到的结果可能甚至不是第一个重复多次的字。

如果您需要报告倍数,我可以建议以下内容:

1)将当前的键值对方法用于'word':'hits' 2)确定“命中”的最大值 3)检查等于最大命中数的值的数量,并将这些键添加到列表中 4)遍历列表以显示具有最大命中数的单词。

Par例子:

greatestNumber = 0
# establish the highest number for wordCounter.values()
for hits in wordCounter.values():
    if hits > greatestNumber:
        greatestNumber = hits

topWords = []
#find the keys that are paired to that value and add them to a list
#we COULD just print them as we iterate, but I would argue that this
#makes this function do too much
for word in wordCounter.keys():
    if wordCounter[word] == greatestNumber:
        topWords.append(word)

#now reveal the results
print "The words that showed up the most, with %d hits:" % greatestNumber
for word in topWords:
    print word

根据Python 2.7或Python 3,您的里程(和语法)可能会有所不同。但理想情况下 - 恕我直言 - 你首先要确定最多的点击次数,然后再返回并将相关条目添加到新列表中。

编辑 - 您可能应该按照其他答案中的建议使用计数器模块。我甚至不知道这是Python准备做的事情。哈哈不接受我的回答,除非你必须必须写自己的柜台!似乎已经有了一个模块。

答案 3 :(得分:0)

使用Python 3.6+,您可以使用statistics.mode

public interface IPolylinePath
{
    SolidColorBrush PolylineColor { get; }

    int PolylineThinkness { get; set; }

    string PolylineTag { get; set; }

    IEnumerable<BasicGeoposition> PolylinePoints { get; set; }

    Geopath PolylineGeopath { get; }

    PolylineColorMode PolylineColorMode { get; set; }


}

答案 4 :(得分:-1)

我对Python并不太热衷,但在你最后的印刷声明中,你不应该有%s吗?

即:打印(“出现最多的词是:%s”,字)