Question

我需要将文本正文中26个字母中每个字母的计数累加到字典中。当用户键入字母时，我需要在文本中显示该字母的频率。我该怎么做呢？

到目前为止，这是我的代码：

import urllib2
import numpy as py
import matplotlib

response = urllib2.urlopen('http://students.healthinformaticshub.ca/jane-austen-sense-n-sensibility.txt')

alphabet = 'abcdefghijklmnopqrstuvwxyz'

# initialize the dict we will use to store our
# counts for the individual vowels:
alphabet_counts = {'a': 0, 'b': 0, 'c': 0, 'd': 0, 'e': 0, 'f': 0, 'g': 0, 'h': 0,\
'i': 0, 'j': 0, 'k': 0, 'l': 0, 'm': 0, 'n': 0, 'o': 0, 'p': 0, 'q': 0, 'r': 0, 's': 0,\
't': 0, 'u': 0, 'v': 0, 'w': 0, 'x': 0, 'y': 0, 'z': 0}

total_letter_count = 0

# loop thru line by line:
for line in response:
    line = line.lower()

    for ch in line:
        if ch in alphabet:
            alphabet_counts[ch] += 1
            total_letter_count += 1


print('# of a\'s: ' + str(alphabet_counts['a']))
print('# of b\'s: ' + str(alphabet_counts['b']))
print('# of c\'s: ' + str(alphabet_counts['c']))
print('# of d\'s: ' + str(alphabet_counts['d']))
print('# of e\'s: ' + str(alphabet_counts['e']))
print('# of f\'s: ' + str(alphabet_counts['f']))
print('# of g\'s: ' + str(alphabet_counts['g']))
print('# of h\'s: ' + str(alphabet_counts['h']))
print('# of i\'s: ' + str(alphabet_counts['i']))
print('# of j\'s: ' + str(alphabet_counts['j']))
print('# of k\'s: ' + str(alphabet_counts['k']))
print('# of l\'s: ' + str(alphabet_counts['l']))
print('# of m\'s: ' + str(alphabet_counts['m']))
print('# of n\'s: ' + str(alphabet_counts['n']))
print('# of o\'s: ' + str(alphabet_counts['o']))
print('# of p\'s: ' + str(alphabet_counts['p']))
print('# of q\'s: ' + str(alphabet_counts['q']))
print('# of r\'s: ' + str(alphabet_counts['r']))
print('# of s\'s: ' + str(alphabet_counts['s']))
print('# of t\'s: ' + str(alphabet_counts['t']))
print('# of u\'s: ' + str(alphabet_counts['u']))
print('# of v\'s: ' + str(alphabet_counts['v']))
print('# of w\'s: ' + str(alphabet_counts['w']))
print('# of x\'s: ' + str(alphabet_counts['x']))
print('# of y\'s: ' + str(alphabet_counts['y']))
print('# of z\'s: ' + str(alphabet_counts['z']))


resp = '''
1.) Find probability of a particular letter of the alphabet 
2.) Show the barplot representing these probabilities for the entire alphabet
3.) Save that barplot as a png file
4.) Quit'''

Answer 1

我不确定你真正想要什么，因为你只是在学习。我会给你一个提示而不是答案。 dict对象有一个名为items和iteritems的方法。获得键和值。要计算获得给定角色的概率，您可以使用iteritems：

char_probabilities = dict()
for character, count in alphabet_counts.iteritems():
    # compute probability given the
    # frequency of character her you can use the 
    # sum builtin and values method on the dict
    char_probabilities[character] = [YOU DO SOME WORK]

Answer 2

我清理了一些代码，并且还添加了一个关于如何使用raw_input的示例（因为您根据使用的urllib2使用了Python 2.7）：

import urllib2
import numpy as py
import matplotlib

response = urllib2.urlopen('http://students.healthinformaticshub.ca/jane-austen-sense-n-sensibility.txt')

alphabet = 'abcdefghijklmnopqrstuvwxyz'

# initialize the dict we will use to store our
# counts for the individual vowels:
alphabet_counts = {letter: 0 for letter in alphabet}

total_letter_count = 0

# loop thru line by line:
for line in response:
    line = line.lower()

    for ch in line:
        if ch in alphabet:
            alphabet_counts[ch] += 1
            total_letter_count += 1

for letter in alphabet_counts:
    print('# of ' + letter + '\'s: ' + str(alphabet_counts[letter]))

letter = raw_input("Enter a character: ")
print('# of ' + letter + '\'s: ' + str(alphabet_counts[letter]))

resp = '''
1.) Find probability of a particular letter of the alphabet 
2.) Show the barplot representing these probabilities for the entire alphabet
3.) Save that barplot as a png file
4.) Quit'''

Answer 3

这似乎是对Counter类的教科书使用：

import collections, urllib2, contextlib

url = 'http://students.healthinformaticshub.ca/jane-austen-sense-n-sensibility.txt'
alphabet_counts = collections.Counter()
with contextlib.closing(urllib2.urlopen(url)) as response:
    for line in response:
        alphabet_counts.update(x for x in line.lower() if x.isalpha())

计数器是dict子类，其行为与原始alphabet_counts类似，但您的工作量要少得多。请记住，您可能希望关闭输入流，这就是我使用with块的原因。

要获得频率，您需要知道计数器值的总和：

total_letters = sum(alphabet_counts.values())
frequencies = {letter: float(count) / total_letters for count, letter in alphabet_counts.iteritems()}

如何从dict python中获取密钥

3 个答案: