我需要将文本正文中26个字母中每个字母的计数累加到字典中。当用户键入字母时,我需要在文本中显示该字母的频率。我该怎么做呢?
到目前为止,这是我的代码:
import urllib2
import numpy as py
import matplotlib
response = urllib2.urlopen('http://students.healthinformaticshub.ca/jane-austen-sense-n-sensibility.txt')
alphabet = 'abcdefghijklmnopqrstuvwxyz'
# initialize the dict we will use to store our
# counts for the individual vowels:
alphabet_counts = {'a': 0, 'b': 0, 'c': 0, 'd': 0, 'e': 0, 'f': 0, 'g': 0, 'h': 0,\
'i': 0, 'j': 0, 'k': 0, 'l': 0, 'm': 0, 'n': 0, 'o': 0, 'p': 0, 'q': 0, 'r': 0, 's': 0,\
't': 0, 'u': 0, 'v': 0, 'w': 0, 'x': 0, 'y': 0, 'z': 0}
total_letter_count = 0
# loop thru line by line:
for line in response:
line = line.lower()
for ch in line:
if ch in alphabet:
alphabet_counts[ch] += 1
total_letter_count += 1
print('# of a\'s: ' + str(alphabet_counts['a']))
print('# of b\'s: ' + str(alphabet_counts['b']))
print('# of c\'s: ' + str(alphabet_counts['c']))
print('# of d\'s: ' + str(alphabet_counts['d']))
print('# of e\'s: ' + str(alphabet_counts['e']))
print('# of f\'s: ' + str(alphabet_counts['f']))
print('# of g\'s: ' + str(alphabet_counts['g']))
print('# of h\'s: ' + str(alphabet_counts['h']))
print('# of i\'s: ' + str(alphabet_counts['i']))
print('# of j\'s: ' + str(alphabet_counts['j']))
print('# of k\'s: ' + str(alphabet_counts['k']))
print('# of l\'s: ' + str(alphabet_counts['l']))
print('# of m\'s: ' + str(alphabet_counts['m']))
print('# of n\'s: ' + str(alphabet_counts['n']))
print('# of o\'s: ' + str(alphabet_counts['o']))
print('# of p\'s: ' + str(alphabet_counts['p']))
print('# of q\'s: ' + str(alphabet_counts['q']))
print('# of r\'s: ' + str(alphabet_counts['r']))
print('# of s\'s: ' + str(alphabet_counts['s']))
print('# of t\'s: ' + str(alphabet_counts['t']))
print('# of u\'s: ' + str(alphabet_counts['u']))
print('# of v\'s: ' + str(alphabet_counts['v']))
print('# of w\'s: ' + str(alphabet_counts['w']))
print('# of x\'s: ' + str(alphabet_counts['x']))
print('# of y\'s: ' + str(alphabet_counts['y']))
print('# of z\'s: ' + str(alphabet_counts['z']))
resp = '''
1.) Find probability of a particular letter of the alphabet
2.) Show the barplot representing these probabilities for the entire alphabet
3.) Save that barplot as a png file
4.) Quit'''
答案 0 :(得分:0)
我不确定你真正想要什么,因为你只是在学习。我会给你一个提示而不是答案。 dict
对象有一个名为items
和iteritems
的方法。获得键和值。要计算获得给定角色的概率,您可以使用iteritems:
char_probabilities = dict()
for character, count in alphabet_counts.iteritems():
# compute probability given the
# frequency of character her you can use the
# sum builtin and values method on the dict
char_probabilities[character] = [YOU DO SOME WORK]
答案 1 :(得分:0)
我清理了一些代码,并且还添加了一个关于如何使用raw_input
的示例(因为您根据使用的urllib2使用了Python 2.7):
import urllib2
import numpy as py
import matplotlib
response = urllib2.urlopen('http://students.healthinformaticshub.ca/jane-austen-sense-n-sensibility.txt')
alphabet = 'abcdefghijklmnopqrstuvwxyz'
# initialize the dict we will use to store our
# counts for the individual vowels:
alphabet_counts = {letter: 0 for letter in alphabet}
total_letter_count = 0
# loop thru line by line:
for line in response:
line = line.lower()
for ch in line:
if ch in alphabet:
alphabet_counts[ch] += 1
total_letter_count += 1
for letter in alphabet_counts:
print('# of ' + letter + '\'s: ' + str(alphabet_counts[letter]))
letter = raw_input("Enter a character: ")
print('# of ' + letter + '\'s: ' + str(alphabet_counts[letter]))
resp = '''
1.) Find probability of a particular letter of the alphabet
2.) Show the barplot representing these probabilities for the entire alphabet
3.) Save that barplot as a png file
4.) Quit'''
答案 2 :(得分:0)
这似乎是对Counter
类的教科书使用:
import collections, urllib2, contextlib
url = 'http://students.healthinformaticshub.ca/jane-austen-sense-n-sensibility.txt'
alphabet_counts = collections.Counter()
with contextlib.closing(urllib2.urlopen(url)) as response:
for line in response:
alphabet_counts.update(x for x in line.lower() if x.isalpha())
计数器是dict
子类,其行为与原始alphabet_counts
类似,但您的工作量要少得多。请记住,您可能希望关闭输入流,这就是我使用with
块的原因。
要获得频率,您需要知道计数器值的总和:
total_letters = sum(alphabet_counts.values())
frequencies = {letter: float(count) / total_letters for count, letter in alphabet_counts.iteritems()}