所以我有一个名为sequence.txt的文件,我已经将文件拆分成列表了,它看起来像这样:
原始文件:
102L序列:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
103L序列:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
104L序列:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSAAELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
我将它们分成列表后:
['>102L', 'Sequence:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL']
['>103L', 'Sequence:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL']
['>104L', 'Sequence:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSAAELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL']
我有另一个名为title.txt的文件,其中包含我想要的序列的所有名称/标题,它看起来像这样:
>102L
>104L
所以我基于这个title.txt文件,我想过滤掉标题列表中没有标题的所有序列,并将它们存储到另一个名为filter_sequence.txt的文件中。因此新文件的结果应如下所示:
102L序列:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
104L序列:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSAAELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
注意到没有103L了。我正在使用python,我不知道如何解决这个问题。谁能帮我?谢谢!
这是我的最终代码:
import string
fin = open('title.txt')
all_titles = fin.readlines()
fin.close()
all_titles = map(string.strip, all_titles)
f = open('filtered_sequence.txt', 'w')
sequence_list = open('sequence.txt')
for sequence in sequence_list:
lists = sequence.strip() # Strip the sequence file into lists of sequence
if lists[0] in all_titles:
write_string = lists[0] + lists[1] + "\n\n"
f.write(write_string)
f.close()
title.txt是:
>102L
>104L
sequence.txt是:
102L Sequence:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
103L Sequence:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
104L Sequence:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSAAELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
我希望我的filtered_sequence.txt看起来像:
102L Sequence:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
104L Sequence:MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSAAELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
但filtered_sequence.txt文件为空。你能救我吗?
答案 0 :(得分:0)
我想将第二个文件存储在列表中,我想是。
import string
f = open("title.txt","r")
all_titles = f.readlines() # Get the data
f.close()
all_titles = map(string.strip,all_titles) # Strip off newlines.
然后all_titles
包含['>102L','>104L']
。从那里,只需做一个“列表中的项目”测试:
f = open("filter_sequence.txt","w") # The file to write to.
for sequence in sequence_list:
if sequence[0] in all_titles: # sequence[0] is the sequence title.
write_string = str(sequence[0]) + ":\nSequence:" + str(sequence[1]) + "\n\n"
f.write(write_string) # Write the string above.
f.close() # Close the file.
那应该做得好。 item in list
是一项快速布尔测试,可以查看list
中的任何项是否等于item
。
注意:如果您要编写102L
而不是>102L
,则可以通过编写sequence[0]
来删除sequence[0][1:]
的第一个字符。这意味着从字符1(这是第二个字符)开始抓取子字符串并继续到结尾。