我有一个名为50267.gff的gff文件,如下所示
#start gene g1
dog1
dog2
dog3
#protein sequence = [DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD]
#end gene g1
###
#start gene g2
cat1
cat2
cat3
#protein sequence = [CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
#CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC]
#end gene g2
###
#start gene g3
pig1
pig2
pig3
...
我想在括号之间获取内容,并创建名为50267.fa的新文件,如下所示
>g1_50267
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
>g2_50267
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCC
...
答案 0 :(得分:0)
您需要在正则表达式中转义方括号。然后,您可以使用捕获组来获取内部的内容。
datePicker.minimumDate = Date() //Today's date
datePicker.maximumDate = Date().addingTimeInterval(60 * 60 * 24 * 180) //180 days forward time from today.
答案 1 :(得分:0)
您可以使用\[(.*?)\]
或\[([^\]]+)
import re
with open("50267.gff", "r") as ff:
matches = re.findall(r'\[([^\]]+)', ff.read())
matches = ['>g' + str(ind+1) + "_50267\n" + x.replace('\n#', ' ') for ind, x in enumerate(matches)]
#print(matches)
with open('50267.fa', 'w') as fa:
fa.write("\n".join(matches))