Question

如果.txt有大约200,000行单个单词，我需要计算每个字母作为单词的第一个字母出现的次数。我有一本带键的词典＆＃39; a＆＃39; - ＆＃39; z＆＃39;，将计数分配给每个值。我需要以

的形式打印出来

a:10,978 b:7,890 c:12,201 d:9,562 e:6,008
f:7,095 g:5,660 (...)

字典目前的打印方式如下

[('a', 10898), ('b', 9950), ('c', 17045), ('d', 10675), ('e', 7421), ('f', 7138), ('g', 5998), ('h', 6619), ('i', 7128), ('j', 1505), ('k'...

如何删除括号和＆amp;括号和每行仅打印5个计数？此外，在我按键对字典进行排序后，它开始打印为键，值而不是键：值

def main():
    file_name = open('dictionary.txt', 'r').readlines()
    alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
    letter = {}
    for i in alphabet:
        letter[i]=0
    for n in letter:
        for p in file_name:
            if p.startswith(n):
                letter[n] = letter[n]+1
    letter = sorted(letter.items())
    print(letter)
main()

Answer 1

您可以使用以下内容：

它遍历您的列表，按5个元素分组，然后以所需格式打印。

在[15]中：

letter = [('a', 10898), ('b', 9950), ('c', 17045), ('d', 10675), ('e', 7421), ('f', 7138), ('g', 5998), ('h', 6619), ('i', 7128), ('j', 1505)]

将print(letter)替换为以下内容：

for grp in range(0, len(letter), 5):
    print(' '.join(elm[0] + ':' + '{:,}'.format(elm[1]) for elm in letter[grp:grp+5]))




a:10,898 b:9,950 c:17,045 d:10,675 e:7,421
f:7,138 g:5,998 h:6,619 i:7,128 j:1,505

Answer 2

collections.Counter dict将获取每行上所有首字母的计数，然后分成块并加入：

from collections import Counter

with open('dictionary.txt') as f: # automatically closes your file
    # iterate once over the file object as opposed to storing 200k lines
    # and 26 iterations over the lines
    c = Counter(line[0] for line in f)
    srt = sorted(c.items())
    # create five element chunks from  the sorted items
    chunks = (srt[i:i+5] for i in range(0, len(srt), 5))
    for chk in chunks:
        # format and join
        print(" ".join("{}:{:,}".format(c[0],c[1]) for c in chk))

如果你可能有字母以外的东西a-z在循环中使用isalpha：

c = Counter(line[0] for line in f if line[0].isalpha())

在python 2.7中添加了Format Specifier for Thousands Separator。

在python中打印字典，每行n个元素

2 个答案: