Question

当我有一个像

这样的长列表时

words = ['axcd', 'abcd', 'abef', 'abxf']

其中每个字符串具有相同的长度，如何计算像

这样的数组

result[0] = [{char: 'a', freq: 4}] 
result[1] = [{char: 'b', freq: 3}, {char: 'x', freq: 1}] # ordered by frequencies
result[2] = [{char: 'c', freq: 2}, {char: 'e', freq: 1}, {char: 'x', freq: 1}]
result[3] = [{char: 'd', freq: 2}, {char: 'f', freq: 2}]

以最有效的方式？

Answer 1

以下是使用collections.Counter和zip的一种方式。

为清晰起见，我明确定义了formatter函数。

from collections import Counter

words = ['axcd', 'abcd', 'abef', 'abxf']

def formatter(res):
    return [{'char': k, 'freq': v} for k, v in sorted(res.items(),
            key=lambda x: x[1], reverse=True)]

result = dict(enumerate(formatter(Counter(i)) for i in zip(*words)))

结果：

{0: [{'char': 'a', 'freq': 4}],
 1: [{'char': 'b', 'freq': 3}, {'char': 'x', 'freq': 1}],
 2: [{'char': 'c', 'freq': 2}, {'char': 'e', 'freq': 1}, {'char': 'x', 'freq': 1}],
 3: [{'char': 'd', 'freq': 2}, {'char': 'f', 'freq': 2}]}

Answer 2

使用旧的zip(*words)技巧
使用Counter计算每行中的字母数
对计数器进行排序＆＃39;元素值
将已排序的元素转换为OrderedDict s

import collections
import operator

words = ['axcd', 'abcd', 'abef', 'abxf']

transposed = zip(*words)
counts = [collections.Counter(letters) for letters in transposed]
sorted_counts = [sorted(dic.items(), key=operator.itemgetter(1), reverse=True)
                       for dic in counts]
result = [collections.OrderedDict(items) for items in sorted_counts]

# result:
# [OrderedDict([('a', 4)]),
#  OrderedDict([('b', 3), ('x', 1)]),
#  OrderedDict([('c', 2), ('e', 1), ('x', 1)]),
#  OrderedDict([('d', 2), ('f', 2)])]

为了更深入地了解每个步骤的作用，我将在此处发布中间结果。

转置输入后，它看起来像这样：

>>> transposed
[('a', 'a', 'a', 'a'),
 ('x', 'b', 'b', 'b'),
 ('c', 'c', 'e', 'x'),
 ('d', 'd', 'f', 'f')]

然后将这些元组转换为计数器：

>>> counts
[Counter({'a': 4}),
 Counter({'b': 3, 'x': 1}),
 Counter({'c': 2, 'e': 1, 'x': 1}),
 Counter({'d': 2, 'f': 2})]

对这些进行排序会将它们转换为(key, value)元组的列表：

>>> sorted_counts
[[('a', 4)],
 [('b', 3), ('x', 1)],
 [('c', 2), ('e', 1), ('x', 1)],
 [('d', 2), ('f', 2)]]

在最后一步中，他们转换为OrderedDicts。

Python：有效地计算单词列表中的字符频率

2 个答案: