我希望这个问题不是多余的。我试图编写一个可以搜索和修改基因序列中任何单个核苷酸的脚本。我已经研究过使用数组和图案,但我不确定如何去做。我非常擅长编程,所以我还在学习。任何帮助表示赞赏!
这是我到目前为止所拥有的:
#!/usr/bin/python
from collections import defaultdict
import re, sys, random
# - - - - - H E A D E R - - - - - - - - - - - - - - - - -
'''
Objectives:
1. Read in a sequence
2. Find a specific segment of that sequence
3. Change a letter (mutation)
4. Output the sequence with the mutation
'''
# - - - - - U S E R V A R I A B L E S - - - - - - - -
mssg = " Search and Destroy"
genFile = 'P1.txt'
inFile = 'P1.txt'
inFolder = '.'
site = "" # what the mutation is
outFile = "Project1-Out.txt"
GenSeqs = defaultdict(lambda: "my own unknown" )
#- - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
print("\n\n", mssg, ". . . . ")
# Task 1: Open a file and read the sequences
IN1 = open( "P1.txt", 'r')
number = 1
for line in IN1:
if (re.match('>', line)):
header = line.rstrip() # remove right white space
else:
GenSeqs[header] = line.rstrip() # Dict[key] = value, key = header, value = sequence
number += 0
print("There are %d gene sequences in file %s" % (number, inFile))
以下是此任务的输入文件:
sp的反向翻译| P68871 | HBB_HUMAN血红蛋白亚基βOS=智人GN = HBB PE = 1 SV = 2至最可能密码子的441碱基序列。 atggtgcatctgaccccggaagaaaaaagcgcggtgaccgcgctgtggggcaaagtgaac gtggatgaagtgggcggcgaagcgctgggccgcctgctggtggtgtatccgtggacccag cgcttttttgaaagctttggcgatctgagcaccccggatgcggtgatgggcaacccgaaa gtgaaagcgcatggcaaaaaagtgctgggcgcgtttagcgatggcctggcgcatctggat aacctgaaaggcacctttgcgaccctgagcgaactgcattgcgataaactgcatgtggat ccggaaaactttcgcctgctgggcaacgtgctggtgtgcgtgctggcgcatcattttggc aaagaatttaccccgccggtgcaggcggcgtatcagaaagtggtggcgggcgtggcgaac gcgctggcgcataaatatcat