有人可以帮忙解释这个python代码吗?

时间:2013-08-14 01:14:53

标签: python counter word-count

有人可以帮助解释这个python代码吗?我正在努力学习更多关于python和i的信息;我对这是如何工作的感兴趣

from collections import Counter
ignore = ['the','a','if','in','it','of','or', 'to','for','is','and','are']


file = raw_input("choose a file: ")
filename = "" + file + ".txt"


 def main():
  with open(filename, 'r') as f:
    p = f.read() # the variable p contains the contents of the file

    words = p.split() #split the text file into individual words (strings)

    wordCount = len(words) #count length of string, how many times does a specific string appear
    print "The total number of words in the text file are:", wordCount

    #  Grab the top N words as input

    counted_words = Counter(words)

    for word in ignore: #check strings for any of the ignore words.
        del counted_words[word] #delete words from 

    defined_words = int(raw_input("How many words do you want?")) #define amount of words to show
    for word, count in counted_words.most_common(defined_words): #parse and store the most commmon words from the file
            print word, count #print output

main()

我试图评论我能做什么,但我不确定它是对的吗?

2 个答案:

答案 0 :(得分:1)

以下是每行的完整评论。

from collections import Counter
# the Counter object defines frequance testing
ignore = ['the','a','if','in','it','of','or', 'to','for','is','and','are']
# this is just an array


file = raw_input("choose a file: ")
# prompts for input, incidentially not verified or tested in any way
filename = "" + file + ".txt"
# should be filename = file + ".txt"


 def main():
    with open(filename, 'r') as f:
        # "with" opens and closes the file for working, and will do it even if the code errors, it's the right way to open files
        p = f.read() # the variable p contains the contents of the file

        words = p.split() #split the text file into individual words (strings)

        wordCount = len(words) #count length of string, how many times does a specific string appear
        # you comment is wrong. Count the number of words in the file
        print "The total number of words in the text file are:", wordCount

        #  Grab the top N words as input (this is also wrong)

        counted_words = Counter(words)
        # this makes a frequancy counter
        # is works sort of like {"word": #number-of-occuerences}, except trickier, but it can be accessed as such
        # it's actually an objec that pretends to be a dictionary, look at the magic object methods for more info

        for word in ignore: #check strings for any of the ignore words.
            del counted_words[word] #deletes frequancy count entries

        defined_words = int(raw_input("How many words do you want?")) #define amount of words to show
        for word, count in counted_words.most_common(defined_words):
            # counted_words.most_common(defined_words) is an array of tuples
            # it has defined_words items
            # each item is (word, count), it is sorted by count
            print word, count #print output

main()

答案 1 :(得分:0)

大部分内容在集合的文档中有解释。计数器:
http://docs.python.org/2/library/collections.html#collections.Counter

至于逻辑......

wordCount是列表的长度,而不是字符串,并且在删除忽略的单词之前计算在原始输入中找到的单词总数。

“抓住前N名”描述了整个计划的其余部分,而不仅仅是下一个声明。

counts_words是一个Counter集合,它是一个字典必不可少的字典,以字为键,相应的字数作为值。

“for ignore in ignore:”循环删除任何列出的常用词。

.most_common(number)方法返回具有最高计数的(数字)单词列表,按计数降序排列。这些是原始输入中最常见(非平凡)的单词。

最终的for循环打印出来。