我需要计算字母频率并将字母和它的频率保存为变量

时间:2014-11-26 00:48:47

标签: python-3.x

我得到的文件输入看起来像这样。

TCCATCTACT
GCGCTTCCTG
TCCATCTACC
TGCGCCCTTA
TCGTACTATC
TTTCGCCACC
TCACGCTTAC
ACCCTGCCGC
CACCTACGCC
CTTCAGCACC

我目前的代码是

def readFile(fileName):
    symbTable = dict()
    with open (fileName, 'r') as f:
        for line in f:
            c = Counter(line)
            symbTable ['A'] = c['A']
            symbTable ['C'] = c['C']
            symbTable ['T'] = c['T']
            symbTable ['G'] = c['G']
    print(symbTable)
    for sym, freq in symbTable.items():
        SymObjList = []
        SymObjList.append(SymbolObject(sym, freq, ""))
        print(SymObjList)
    return symbTable, SymObjList

问题是我的程序只适用于文件中的一行。如何才能在整个文件中找到字母频率?

2 个答案:

答案 0 :(得分:0)

你需要一起遍历这些行。如果您在循环之前调用lines = f.readlines()然后循环遍历lines,那应该可以。

答案 1 :(得分:0)

你非常接近:

import collections

def readFile(filename):
    with open(filename) as infile:
        counts = collections.Counter(char for line in infile for char in line)
    symObjList = []
    for nuc in "ATCG":
        symObjList.append(SymbolObject(nuc, counts[nuc], ""))
        print(nuc, "appears", counts[nuc], "times")

    return {k:counts[k] for k in "ATCG"}, symObjList