如果.txt有大约200,000行单个单词,我需要计算每个字母作为单词的第一个字母出现的次数。我有一本带键的词典' a' - ' z',将计数分配给每个值。我需要以
的形式打印出来a:10,978 b:7,890 c:12,201 d:9,562 e:6,008
f:7,095 g:5,660 (...)
字典目前的打印方式如下
[('a', 10898), ('b', 9950), ('c', 17045), ('d', 10675), ('e', 7421), ('f', 7138), ('g', 5998), ('h', 6619), ('i', 7128), ('j', 1505), ('k'...
如何删除括号和&括号和每行仅打印5个计数?此外,在我按键对字典进行排序后,它开始打印为键,值而不是键:值
def main():
file_name = open('dictionary.txt', 'r').readlines()
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
letter = {}
for i in alphabet:
letter[i]=0
for n in letter:
for p in file_name:
if p.startswith(n):
letter[n] = letter[n]+1
letter = sorted(letter.items())
print(letter)
main()
答案 0 :(得分:1)
您可以使用以下内容:
它遍历您的列表,按5个元素分组,然后以所需格式打印。
在[15]中:
letter = [('a', 10898), ('b', 9950), ('c', 17045), ('d', 10675), ('e', 7421), ('f', 7138), ('g', 5998), ('h', 6619), ('i', 7128), ('j', 1505)]
将print(letter)
替换为以下内容:
for grp in range(0, len(letter), 5):
print(' '.join(elm[0] + ':' + '{:,}'.format(elm[1]) for elm in letter[grp:grp+5]))
a:10,898 b:9,950 c:17,045 d:10,675 e:7,421
f:7,138 g:5,998 h:6,619 i:7,128 j:1,505
答案 1 :(得分:0)
collections.Counter dict将获取每行上所有首字母的计数,然后分成块并加入:
from collections import Counter
with open('dictionary.txt') as f: # automatically closes your file
# iterate once over the file object as opposed to storing 200k lines
# and 26 iterations over the lines
c = Counter(line[0] for line in f)
srt = sorted(c.items())
# create five element chunks from the sorted items
chunks = (srt[i:i+5] for i in range(0, len(srt), 5))
for chk in chunks:
# format and join
print(" ".join("{}:{:,}".format(c[0],c[1]) for c in chk))
如果你可能有字母以外的东西a-z在循环中使用isalpha:
c = Counter(line[0] for line in f if line[0].isalpha())
在python 2.7中添加了Format Specifier for Thousands Separator。