我有多个Phylip格式的数据集(下面指定),我想使用这个python代码转换为Fasta(下面指定):
for j in range(1, 10):
inFile = open('/path/to/input_sequence/seqfile_00' +str(j) + '.txt', 'r')
outFile = open('/path/to/output_sequence/Fasta/seqfile_00' + str(j) +'.txt', 'w')
inLines = inFile.readlines()
inFile.close()
outLines = inLines[1:17]
for line in outLines:
if line.startswith('\n'):
line = line.replace('\n','')
outFile.write(line.replace(' ',' \n').replace('sequence', '>sequence'))
outFile.close()
这就是我的Phylip(input_sequences)的样子:
8 1500\n
\n
sequence1 CTGTCCTTG...\n
\n
sequence2 CTGTCGTTG...\n
\n
sequence3 CTGCGTATG...\n
\n
sequence4 CTATGCCTG...\n
\n
sequence5 AGGTGTAAG...\n
\n
sequence6 AGGTGTAAG...\n
\n
sequence7 AAATTCAAA...\n
\n
sequence8 AAGTCCAAA...\n
\n
这就是我想要的output_sequences(Fasta格式):
>sequence1 \n
CTGTCCTTGG...\n
>sequence2 \n
CTGTCGTTGG...\n
>sequence3 \n
CTGCGTATGG...\n
>sequence4 \n
CTATGCCTGG...\n
>sequence5 \n
AGGTGTAAGG...\n
>sequence6 \n
AGGTGTAAGA...\n
>sequence7 \n
AAATTCAAAG...\n
>sequence8 \n
AAGTCCAAAA...\n
当我运行上面的代码时,我得到j = 1的正确输出但是下面的j(2:9)我得到了这个输出
\n
>sequence1 *red inverted question mark*CTGTCCTTGG...\n
>sequence2 *red inverted question mark*CTGTCGTTGG...\n
>sequence3 *red inverted question mark*CTGCGTATGG...\n
>sequence4 *red inverted question mark*CTATGCCTGG...\n
>sequence5 *red inverted question mark*AGGTGTAAGG...\n
>sequence6 *red inverted question mark*AGGTGTAAGA...\n
>sequence7 *red inverted question mark*AAATTCAAAG...\n
>sequence8 *red inverted question mark*AAGTCCAAAA...\n
(...是继续序列,红色反转问号是我在文本管理员中显示隐形时所看到的。)
我想一般的问题,以及为什么我感到困惑,为什么/如何为j = 1而不是其他数字的代码工作正常?以及如何解决这个问题?
提前致谢!