我得到的文件输入看起来像这样。
TCCATCTACT
GCGCTTCCTG
TCCATCTACC
TGCGCCCTTA
TCGTACTATC
TTTCGCCACC
TCACGCTTAC
ACCCTGCCGC
CACCTACGCC
CTTCAGCACC
我目前的代码是
def readFile(fileName):
symbTable = dict()
with open (fileName, 'r') as f:
for line in f:
c = Counter(line)
symbTable ['A'] = c['A']
symbTable ['C'] = c['C']
symbTable ['T'] = c['T']
symbTable ['G'] = c['G']
print(symbTable)
for sym, freq in symbTable.items():
SymObjList = []
SymObjList.append(SymbolObject(sym, freq, ""))
print(SymObjList)
return symbTable, SymObjList
问题是我的程序只适用于文件中的一行。如何才能在整个文件中找到字母频率?
答案 0 :(得分:0)
你需要一起遍历这些行。如果您在循环之前调用lines = f.readlines()
然后循环遍历lines
,那应该可以。
答案 1 :(得分:0)
你非常接近:
import collections
def readFile(filename):
with open(filename) as infile:
counts = collections.Counter(char for line in infile for char in line)
symObjList = []
for nuc in "ATCG":
symObjList.append(SymbolObject(nuc, counts[nuc], ""))
print(nuc, "appears", counts[nuc], "times")
return {k:counts[k] for k in "ATCG"}, symObjList