如何制作两个或更多特定字母的组合?

时间:2014-07-17 14:52:12

标签: python list iterator combinations itertools

我是python的新手,在过去的一周里我一直在努力做到这一点,有人可以帮助我解决这个问题,这对完成我的项目非常有帮助。

我尝试根据给定序列的用户输入进行单突变及其2,3种组合:

INPUT SEQUENCE:> PEACCEL

用户突变输入文件:
   E2R
   C4W
   E6G

#!/usr/bin/python
import getopt
import sys
import itertools as it 
from itertools import groupby

def main(argv):
try:
    opts,operands = getopt.getopt(sys.argv[1:],'i:m:o:'["INPUT_FILE:=","MUTATIONFILE:=","OUTPUT_FILE:=","help"])
    if len(opts) == 0:
        print "Please use the correct arguments, for usage type --help "
    else:
        for option,value in opts:
            if option == "-i" or option == "--INPUT_FILE:":
                seq=inputFile(value)
            if option == "-m" or option == "--MUTATION_FILE:":
                conA=MutationFile(value)
            if option == "-o" or option == "--OUTPUT_FILE:":
                out=outputName(value)
        return seq,conA
except getopt.GetoptError,err:
       print str(err)
       print "Please use the correct arguments, for usage type --help"

def inputFile(value):
try:
    fh = open(value,'r')
except IOError:
    print "The file %s does not exist \n" % value
else:
    ToSeperate= (x[1] for x in groupby(fh, lambda line: line[0] == ">"))
    for header in ToSeperate:
        header = header.next()[1:].strip()
        Sequence = "".join(s.strip() for s in ToSeperate.next())
        return Sequence

 def MutationFile(value):
 try:
    fh=open(value,'r')
    content=fh.read()
    Rmcontent=str(content.rstrip())
 except IOError:
    print "The file %s does not exist \n" % MutFile
 else:
    con=list(Rmcontent)
    return con

def Mutation(SEQUENCES,conA):
 R=len(conA)
 if R>1:
    out=[]  
    SecondNum=1
    ThirdChar=2
    for index in range(len(conA)):
        MR=conA[index]
        if index==SecondNum:
            SN=MR
            SecondNum=SecondNum+4
        if index==ThirdChar:
            TC=MR
            ThirdChar=ThirdChar+4

            SecNum=int(SN.rstrip())
            MutateResidue=str(TC.rstrip())
            for index in range(len(SEQUENCES)):
                if index==SecNum-1:
                    NonMutate=SEQUENCES[index]
                    AfterMutate=NonMutate.replace(NonMutate,MutateResidue)
                    new=SEQUENCES[ :index]+AfterMutate+SEQUENCES[index+1: ]
                    MutatedInformation= ['>',NonMutate,index+1,MutateResidue,'\n',new]
                    values2 = ''.join(str(i)for i in MutatedInformation)

if __name__ == "__main__":          
seq,conA=main(sys.argv[1:])
Mutation(seq,conA)

这是我的程序部分,我将(2,4,6)的R,W,G替换为E,C,E然后将这些替换的字母存储到名为R的变量中,其中包含三行,如下所示: -

PrACCEL
PEAwCEL
PEACCgL

现在,我想从这三个单突变中做出2个和3个组合。 这就像是一行中两个突变的梳子和一行中的三个突变。

样本和预期输出将如下:

2C
   PrAwCEL 
   PrACCgL 
   PEAwCgL 
3C
   PrAwCgL 

算法

他是我的代码的一部分所以我将解释我的算法

1.我读取了具有三个字符的突变文件,例如(E2R)其中(E)是氨基酸字母,它是(2)输入序列PEACCEL的位置,第三个字母(R)是E2将是R.

2.首先,我从用户变异文件中提取位置和第三个变量,并将它们存储到变量SecNum和MutateResidue(thirdchar)中。

3.然后,我用循环来读取索引的序列(PEACCEL),然后无论哪个索引与SecNUm(E2,4,6)匹配,我都用Mutate Residue替换那些序列,Mutate Residue是变异文件中的第三个字符( 2R,4W,6G)

4.最后我通过这一行加入突变残基指数和其他残基:( new = SEQUENCES [:index] + AfterMutate + SEQUENCES [index + 1:]

提前致谢

1 个答案:

答案 0 :(得分:0)

from itertools import combinations,chain
from collections import Counter


def Mutation(SEQUENCES,conA):

    #mutations=map(lambda x:x.strip(),open('a.txt','r').readlines())

    mutation_combinations= chain.from_iterable([list(combinations(conA,i))for i in range(1,4)])
    #[('E2R',), ('C4W',), ('E6G',), ('E2R', 'C4W'), ('E2R', 'E6G'), ('C4W', 'E6G'), ('E2R', 'C4W', 'E6G')]

    for i in mutation_combinations:
        print "combination :"+'_'.join(i)
        c=Counter({})
        temp_string=SEQUENCES
        for j in i:
            c[j[1]]=j[2].lower()
        for index,letter in c.items():
            temp_string=temp_string[:int(index)-1]+letter+temp_string[int(index):]
        print temp_string

输出

combination :E2R
PrACCEL
combination :C4W
PEAwCEL
combination :E6G
PEACCgL
combination :E2R_C4W
PrAwCEL
combination :E2R_E6G
PrACCgL
combination :C4W_E6G
PEAwCgL
combination :E2R_C4W_E6G
PrAwCgL

我遵循的算法:

  1. 使用mutations=map(lambda x:x.strip(),open('a.txt','r').readlines())
  2. 从文件中读取E2R ....等变异序列
  3. 如果您有mutation_combinations= chain.from_iterable([list(combinations(mutations,i))for i in range(1,4)])个突变,您希望所有四个更改4
  4. ,就会发生突变range value to 5的组合
  5. 因此,对于每个组合,我用指定的字符替换它们

    for j in i:
        c[j[1]]=j[2].lower()
    
  6. 我使用上面的计数器来跟踪在突变组合期间要替换的字符