所以我的任务是在文本文件中加入文本,但是就像我尝试的任何东西都无法正常工作。我尝试了split,但是它需要字符串而不是数组,而join根本对我没有帮助,因为我已经有可以完成这项工作的代码。
带有文字的文本文件如下(文件名= demo_fasta_file_2019.fsa):
>sequence_1
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
>sequence_2
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
GATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGAGATCGATCGA
>sequence_3
TTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAA
TTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAATTTTGGAAAA
>sequence_4
GGTTAACCATGGATC
我拥有的代码如下:
#def Read_FastA_Names_And_Sequences(filepath):
#############
filepath=str("demo_fasta_file_2019.fsa")
##sequence_names,sequences = Read_FastA_Names_And_Sequences(filepath)
sequence_names=[]
sequences=[]
number_of_sequences=4
#############
textfile = open(filepath, 'r')
sequence = textfile.readlines()
for i in sequence:
if i.__contains__('>'):
a=i[1:]
sequence_names.append(a[:a.__len__()-1])
i=+1
print(sequence)
#list1 = sequence
#s = "\n"
#s = s.join(list1)
#print(s)
list2 = sequence
words2 = list2.split(">")
print(words2)
所以我的问题是,我如何只加入不包含> sequence_1,> sequence_2,> sequence_3,> sequence_4的文本?
答案 0 :(得分:1)
使用Biopython可以轻松实现,对于在fasta文件上执行其他任务也可能有用:
from Bio import SeqIO
concatenated_sequence = ""
fasta_sequences = SeqIO.parse(open(input_file),'fasta')
for fasta in fasta_sequences:
# id is stored in fasta.id
# the sequence is stored in fasta.seq, and need to be transformed to str()
concatenated_sequence += str(fasta.seq)
答案 1 :(得分:0)
您可以使用生成器表达式过滤不以>
开头的行,并使用str.join
对其进行串联:
print(''.join(line for line in open("demo_fasta_file_2019.fsa") if not line.startswith('>')))