谷歌的Python课程wordcount.py

时间:2016-07-15 20:56:24

标签: python

我正在学习使用Python 2.7的Google Python课程。我正在运行3.5.2。

脚本功能。这是我的一个练习。

#!/usr/bin/python -tt
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

"""Wordcount exercise
Google's Python class

The main() below is already defined and complete. It calls print_words()
and print_top() functions which you write.

1. For the --count flag, implement a print_words(filename) function that counts
how often each word appears in the text and prints:
word1 count1
word2 count2
...

Print the above list in order sorted by word (python will sort punctuation to
come before letters -- that's fine). Store all the words as lowercase,
so 'The' and 'the' count as the same word.

2. For the --topcount flag, implement a print_top(filename) which is similar
to print_words() but which prints just the top 20 most common words sorted
so the most common word is first, then the next most common, and so on.

Use str.split() (no arguments) to split on all whitespace.

Workflow: don't build the whole program at once. Get it to an intermediate
milestone and print your data structure and sys.exit(0).
When that's working, try for the next milestone.

Optional: define a helper function to avoid code duplication inside
print_words() and print_top().

"""

import sys

# +++your code here+++
# Define print_words(filename) and print_top(filename) functions.
# You could write a helper utility function that reads a fcd ile
# and builds and returns a word/count dict for it.
# Then print_words() and print_top() can just call the utility function.

###

def word_count_dict(filename):
  """Returns a word/count dict for this filename."""
  # Utility used by count() and Topcount().
  word_count={} #Map each word to its count
  input_file=open(filename, 'r')
  for line in input_file:
    words=line.split()
    for word in words:
      word=word.lower()
      # Special case if we're seeing this word for the first time.
      if not word in word_count:
        word_count[word]=1
      else:
        word_count[word]=word_count[word] + 1
  input_file.close() # Not strictly required, but good form.
  return word_count

def print_words(filename):
  """Prints one per line '<word> <count>' sorted by word for the given file."""
  word_count=word_count_dict(filename)
  words=sorted(word_count.keys())
  for word in words:
    print(word,word_count[word])

def get_count(word_count_tuple):
  """Returns the count from a dict word/count tuple -- used for custom sort."""
  return word_count_tuple[1]

def print_top(filename):
  """Prints the top count listing for the given file."""
  word_count=word_count_dict(filename)

  # Each it is a (word, count) tuple.
  # Sort the so the big counts are first using key=get_count() to extract count.
  items=sorted(word_count.items(), key=get_count, reverse=True)

  # Print the first 20
  for item in items[:20]:
    print(item[0], item[1])

# This basic command line argument parsing code is provided and
# calls the print_words() and print_top() functions which you must define.
def main():
  if len(sys.argv) != 3:
    print('usage: ./wordcount.py {--count | --topcount} file')
    sys.exit(1)

  option = sys.argv[1]
  filename = sys.argv[2]
  if option == '--count':
    print_words(filename)
  elif option == '--topcount':
    print_top(filename)
  else:
    print ('unknown option: ' + option)
    sys.exit(1)

if __name__ == '__main__':
  main()

以下是我的问题,课程没有回答:

  1. 以下内容如下所述,我不确定1+1的含义。这是否意味着if the word is not in the list, add it to the list? (word_count[word]=1)?并且,我不明白这意味着什么,word_count[word]=word_count[word] + 1

      if not word in word_count:
        word_count[word]=1
      else:
        word_count[word]=word_count[word] + 1
    
  2. 当它显示word_count.keys()时,除了调用我们定义的字典中的键并将键和值加载到其中之外,我不确定是做什么的。我只想了解为什么word_count.keys()存在。

      words=sorted(word_count.keys())
    
  3. word_count在几个地方重新定义,我想知道为什么而不是创建一个新的变量名称,例如word_count1

      word_count={}
      word_count=word_count_dict(filename)
      ...and also in places outlined in my 1st question.
    
  4. if len(sys.argv) != 3:是否意味着如果我的论点不是3,或者我的字符不是3(例如sys.argv[1]sys.argv[2]sys.argv[3]

  5. 感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

  1. 如果word不在字典中,我们会在字典中为它创建一个新条目,并将值设置为数字1,因为我们已经这样做了远远发现这个词出现了1次。否则,我们从字典中检索旧值,使用+ 1将1添加到该值,然后通过分配回word_count[word]将其放回字典条目中。这也可以写成:

    word_count[word] += 1
    
  2. word_count.keys()返回word_count字典中所有键的列表。正在使用它,以便可以使用sort()按字母顺序打印字典的内容。如果你只是按照它的方式打印字典,那么单词将会以某种不可预测的顺序排列。

  3. 未重新定义变量。变量是每个函数的局部变量,因此每个word_count是一个不同的变量。它们碰巧在每个函数中使用相同的名称,因为它是变量包含的好名称。

  4. 列表索引以0开头,因此if (len(sys.argv) != 3会检查您是argv[0]argv[1]还是argv[2]argv[0]始终包含脚本名称,因此检查您是否为脚本提供了2个参数。第一个参数必须是--count--topcount,第二个参数必须是用于计算单词的文件名。