有人可以帮助解释这个python代码吗?我正在努力学习更多关于python和i的信息;我对这是如何工作的感兴趣
from collections import Counter
ignore = ['the','a','if','in','it','of','or', 'to','for','is','and','are']
file = raw_input("choose a file: ")
filename = "" + file + ".txt"
def main():
with open(filename, 'r') as f:
p = f.read() # the variable p contains the contents of the file
words = p.split() #split the text file into individual words (strings)
wordCount = len(words) #count length of string, how many times does a specific string appear
print "The total number of words in the text file are:", wordCount
# Grab the top N words as input
counted_words = Counter(words)
for word in ignore: #check strings for any of the ignore words.
del counted_words[word] #delete words from
defined_words = int(raw_input("How many words do you want?")) #define amount of words to show
for word, count in counted_words.most_common(defined_words): #parse and store the most commmon words from the file
print word, count #print output
main()
我试图评论我能做什么,但我不确定它是对的吗?
答案 0 :(得分:1)
以下是每行的完整评论。
from collections import Counter
# the Counter object defines frequance testing
ignore = ['the','a','if','in','it','of','or', 'to','for','is','and','are']
# this is just an array
file = raw_input("choose a file: ")
# prompts for input, incidentially not verified or tested in any way
filename = "" + file + ".txt"
# should be filename = file + ".txt"
def main():
with open(filename, 'r') as f:
# "with" opens and closes the file for working, and will do it even if the code errors, it's the right way to open files
p = f.read() # the variable p contains the contents of the file
words = p.split() #split the text file into individual words (strings)
wordCount = len(words) #count length of string, how many times does a specific string appear
# you comment is wrong. Count the number of words in the file
print "The total number of words in the text file are:", wordCount
# Grab the top N words as input (this is also wrong)
counted_words = Counter(words)
# this makes a frequancy counter
# is works sort of like {"word": #number-of-occuerences}, except trickier, but it can be accessed as such
# it's actually an objec that pretends to be a dictionary, look at the magic object methods for more info
for word in ignore: #check strings for any of the ignore words.
del counted_words[word] #deletes frequancy count entries
defined_words = int(raw_input("How many words do you want?")) #define amount of words to show
for word, count in counted_words.most_common(defined_words):
# counted_words.most_common(defined_words) is an array of tuples
# it has defined_words items
# each item is (word, count), it is sorted by count
print word, count #print output
main()
答案 1 :(得分:0)
大部分内容在集合的文档中有解释。计数器:
http://docs.python.org/2/library/collections.html#collections.Counter
至于逻辑......
wordCount是列表的长度,而不是字符串,并且在删除忽略的单词之前计算在原始输入中找到的单词总数。
“抓住前N名”描述了整个计划的其余部分,而不仅仅是下一个声明。
counts_words是一个Counter集合,它是一个字典必不可少的字典,以字为键,相应的字数作为值。
“for ignore in ignore:”循环删除任何列出的常用词。
.most_common(number)方法返回具有最高计数的(数字)单词列表,按计数降序排列。这些是原始输入中最常见(非平凡)的单词。
最终的for循环打印出来。