如何使用python搜索和修改基因序列中的一个核苷酸?

时间:2018-01-08 14:25:58

标签: python

我希望这个问题不是多余的。我试图编写一个可以搜索和修改基因序列中任何单个核苷酸的脚本。我已经研究过使用数组和图案,但我不确定如何去做。我非常擅长编程,所以我还在学习。任何帮助表示赞赏!

这是我到目前为止所拥有的:

#!/usr/bin/python

from collections import defaultdict
import re, sys, random

# - - - - - H E A D E R - - - - - - - - - - - - - - - - -
'''
Objectives:
1. Read in a sequence
2. Find a specific segment of that sequence
3. Change a letter (mutation)
4. Output the sequence with the mutation

'''

# - - - - - U S E R    V A R I A B L E S - - - - - - - -

mssg = " Search and Destroy"
genFile  = 'P1.txt'
inFile   = 'P1.txt'
inFolder = '.'
site   = ""  # what the mutation is 
outFile  = "Project1-Out.txt"
GenSeqs = defaultdict(lambda: "my own unknown" )
#- - - - - - - - - - - - - - -
# - - - - - M A I N - - - - - - - - - - - - - - - - - - - -
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

print("\n\n", mssg, ". . . . ")


# Task 1: Open a file and read the sequences
IN1 = open( "P1.txt", 'r')
number = 1
for line in IN1:
    if (re.match('>', line)):
        header = line.rstrip()   # remove right white space
    else:
        GenSeqs[header] = line.rstrip()   # Dict[key] = value, key = header, value = sequence
        number += 0

print("There are %d gene sequences in file %s" % (number, inFile))

以下是此任务的输入文件:

  sp的反向翻译| P68871 | HBB_HUMAN血红蛋白亚基βOS=智人GN = HBB PE = 1 SV = 2至最可能密码子的441碱基序列。   atggtgcatctgaccccggaagaaaaaagcgcggtgaccgcgctgtggggcaaagtgaac   gtggatgaagtgggcggcgaagcgctgggccgcctgctggtggtgtatccgtggacccag   cgcttttttgaaagctttggcgatctgagcaccccggatgcggtgatgggcaacccgaaa   gtgaaagcgcatggcaaaaaagtgctgggcgcgtttagcgatggcctggcgcatctggat   aacctgaaaggcacctttgcgaccctgagcgaactgcattgcgataaactgcatgtggat   ccggaaaactttcgcctgctgggcaacgtgctggtgtgcgtgctggcgcatcattttggc   aaagaatttaccccgccggtgcaggcggcgtatcagaaagtggtggcgggcgtggcgaac   gcgctggcgcataaatatcat

0 个答案:

没有答案