我一直在处理一个项目,要求我计算1&0和#0的数量 它描述了氨基酸对肽稳定性的影响。文件中有大约300种不同的肽序列。我希望我的代码能够从我的文本文件中识别肽序列的开始,计算其长度,然后计算每个氨基酸记录的1和0的数量。到目前为止,我一直在努力让我的代码使用索引编号来识别序列的开始,这里有我所拥有的
input_file01=open (r'C:/Users/12345/Documents/Dr Blan Research/MHC I 17 NOV2016.txt')
Output_file01= open ('MHC I 17 NOV2016OUT.txt','w')
for line in input_file01:
templist=line.split()
a=line[0]
for i in range(0,len(a)):
if a[i]==1:
b=line[0+1]
index=i
count=+1
Output_file01.write(a)
Output_file01.write(b)
else:
break
Here is an example of the content in the file. I want my code to count the peptide sequence, count the number of 1's and 0's and find their ratios within each peptide seq.
# 1 - Amino acid number
# 2 - One letter code
# 3 - ANCHOR probability value
# 4 - ANCHOR output
#
1 A 0.3129 0
2 P 0.4044 0
3 K 0.5258 1
4 R 0.6358 1
5 P 0.7277 1
6 P 0.7895 1
7 S 0.8710 1
8 A 0.9358 1
9 F 0.9680 1