Question

我有一个像这样的python列表：

List = [s,b,s,d,h,a,h,e,h,a]

是否有一种简单的方法可以找出哪些字母最常见。

在我的清单中：

h,a = 2x

拥有一个完整的表格，其中包含哪些字母的频率也是非常酷的。但我不确定如何处理这个

  b d a e
s 1 1
h     2 1

Answer 1

您可以使用zip和collections.Counter获取具有这些频率的关注对：

>>> from collections import Counter
>>> c= Counter(zip(l,l[1:]))
Counter({('h', 'a'): 2, ('d', 'h'): 1, ('s', 'b'): 1, ('s', 'd'): 1, ('b', 's'): 1, ('h', 'e'): 1, ('a', 'h'): 1, ('e', 'h'): 1})

然后使用most_common方法，您可以获得最常见的一对：

>>> c.most_common(1)
[(('h', 'a'), 2)]

Answer 2

使用collections.Counter()并输入字母对：

from collections import Counter

pair_counts = Counter(zip(List, List[1:]))

演示：

>>> from collections import Counter
>>> List = ['s', 'b' , 's', 'd', 'h', 'a', 'h', 'e', 'h', 'a']
>>> pair_counts = Counter(zip(List, List[1:]))
>>> pair_counts.most_common()
[(('h', 'a'), 2), (('d', 'h'), 1), (('s', 'b'), 1), (('s', 'd'), 1), (('b', 's'), 1), (('h', 'e'), 1), (('a', 'h'), 1), (('e', 'h'), 1)]
>>> pair_counts.most_common(1)
[(('h', 'a'), 2)]

计数器也可用于制作你的桌子：

values = sorted(set(List))
colwidth = len(str(pair_counts.most_common(1)[0][1]))
row_template = '{} ' + ' '.join(['{:>{colwidth}}'] * len(values))
print row_template.format(' ', colwidth=colwidth, *values)
for a in values:
    print row_template.format(a, colwidth=colwidth, *(
        pair_counts.get((a, b), '') for b in values))

产生：

  a b d e h s
a         1  
b           1
d         1  
e         1  
h 2     1    
s   1 1

Answer 3

以下是生成所需表格的一种方法（但不使用Counter）：

s = 'sbsdhaheha'

pairs = {}
for i,l in enumerate(s[:-1]):
    try:
        pairs[l + s[i+1]] += 1
    except KeyError:
        pairs[l + s[i+1]] = 1

l2s = sorted(set(pair[1] for pair in pairs))
print(' ', ' '.join(l2s))
for l1 in sorted(set(pair[0] for pair in pairs)):
    row = []
    for l2 in l2s:
        try:
            row.append(str(pairs[l1+l2]))
        except KeyError:
            row.append(' ')
    print(l1, ' '.join(row))

输出：

  a b d e h s
a         1  
b           1
d         1  
e         1  
h 2     1    
s   1 1

Answer 4

为避免构建新列表，您可以使用itertools.islice：

from itertools import islice
l = ["s", "b", "s", "d", "h", "a", "h", "e", "h", "a"]
Counter(zip(l, islice(l, 1, None))).most_common()

如果您使用的是python2，请使用izip：

from itertools import islice, izip   
Counter(izip(l, islice(l, 1, None))).most_common()

找到python列表中最常见的字母

4 个答案: