计数以字符串形式运行

时间:2013-09-06 19:51:31

标签: python string

我有一个看起来像的字符串:

string = 'TTHHTHHTHHHHTTHHHTTT'

如何计算字符串中的运行次数,以便我得到,

5次运行T和4次运行H

2 个答案:

答案 0 :(得分:21)

您可以结合使用itertools.groupbycollections.Counter

>>> from itertools import groupby
>>> from collections import Counter
>>> strs = 'TTHHTHHTHHHHTTHHHTTT'
>>> Counter(k for k, g in groupby(strs))
Counter({'T': 5, 'H': 4})

itertools.groupby根据键对项目进行分组。(默认情况下,键是迭代本身中的项目)

>>> from pprint import pprint
>>> pprint([(k, list(g)) for k, g in groupby(strs)])
[('T', ['T', 'T']),
 ('H', ['H', 'H']),
 ('T', ['T']),
 ('H', ['H', 'H']),
 ('T', ['T']),
 ('H', ['H', 'H', 'H', 'H']),
 ('T', ['T', 'T']),
 ('H', ['H', 'H', 'H']),
 ('T', ['T', 'T', 'T'])]

此处的第一项是密钥(k),根据该密钥对项目进行分组,list(g)是与该密钥相关的组。由于我们只对key部分感兴趣,因此我们可以将k传递给collections.Counter以获得所需的答案。

答案 1 :(得分:2)

对于多样性,基于re的方法

import re
letters = ['H', 'T']
matches = re.findall(r'({})\1*'.format('|'.join(letters)), 'TTHHTHHZTHHHHTTHHHTTT')      
print matches
['T', 'H', 'T', 'H', 'T', 'H', 'T', 'H', 'T']
[(letter, matches.count(letter)) for letter in letters]
[('H', 4), ('T', 5)]