如何分组和计算随机字符串?

时间:2014-08-31 16:41:19

标签: python count

我有几千行数据,如下所示:

TTGGGG**TCTCCAT**  
TTCTTC**TCTCCAT**  
TTGGGG**TCTCCAT**  
TTCTTC**TCTCCAT**  
TATTAT**TCTCCAT**  

我想对数据进行分组和计数,以获得如下输出:

TTGGGG**TCTCCAT** - 2  
TTGGGG**TCTCCAT** - 2  
TATTAT**TCTCCAT** - 1  

由于粗体字符前面的6个字符是随机的,我不知道如何在python中编写代码。

2 个答案:

答案 0 :(得分:0)

from collections import Counter
with open('path/to/input') as infile:
    counts = collections.Counter(line.strip() for line in infile)
for seq, count in counts.items():
    print(seq, '-', count)

以上解决方案使用collections.Counter 另一方面,如果您不想使用标准库中内置的帮助程序,那么您可以执行以下相同的结果:

counts = {}
with open('path/to/input') as infile:
    for line in infile:
        seq = line.strip()
        if seq not in counts:
            counts[seq] = 0
        counts[seq] += 1
    for seq, count in counts.items():
        print(seq, '-', count)

答案 1 :(得分:0)

第一种方法:

示例:

>>[1, 2, 3, 4, 1, 4, 1].count(1)
3

因此在你的情况下:

>>['TTGGGG**TCTCCAT**','TTCTTC**TCTCCAT**','TTGGGG**TCTCCAT**','TTCTTC**TCTCCAT**','TATTAT**TCTCCAT**'].count('TTGGGG**TCTCCAT**')

第二种方法:

>>> from collections import Counter
>>> z = ['TTGGGG**TCTCCAT**',TTCTTC**TCTCCAT**',TTGGGG**TCTCCAT**','TTCTTC**TCTCCAT**','TATTAT**TCTCCAT**']
>>> Counter(z)
Counter({'TTGGGG**TCTCCAT**':2, 'TTGGGG**TCTCCAT**': 2, 'TATTAT**TCTCCAT**': 1})