如何在R(组合)中解决这个问题

时间:2016-06-11 12:30:51

标签: r

我的序列代码如下所示

A   =   A   
C   =   C   
G   =   G   
T   =   T 
W   =   A or T  
S   =   C or G  
M   =   A or C  
K   =   G or T 
R   =   A or G  
Y   =   C or T 
B   =   C or G or T 
D   =   A or G or T 
H   =   A or C or T 
V   =   A or C or G 
N = A or C or G or T 

我想计算序列的组合。

以下是我想要获得的组合的几个例子:

Example 1 ATGTTTGARCCACGYATHCCTAC 

Example 2 CAACGTCGTAATAAGGAAGTTTAG 

Example 3 CAGGTTGAGTATYTWCAAATTAC

Example 4 AAACCATRATGCCATTATAATATTG

如果您能为我提供可能的代码,请务必提供帮助。

2 个答案:

答案 0 :(得分:3)

您可以使用chartr ro替换字符

#Your string
x <- "ATGCTGATCGAGCTANATCGATCGGACTACY"

# Get all combinations of replacement strings
# paste together for chartr function
ex <- do.call(paste0, expand.grid(N = c("A", "T", "G", "C"), Y = c("C", "T")))

# koop through each combination , replacing the string
sapply(ex, chartr, old="NY", x=x)

#                                AC                                TC 
# "ATGCTGATCGAGCTAAATCGATCGGACTACC" "ATGCTGATCGAGCTATATCGATCGGACTACC" 
#                                GC                                CC 
# "ATGCTGATCGAGCTAGATCGATCGGACTACC" "ATGCTGATCGAGCTACATCGATCGGACTACC" 
#                                AT                                TT 
# "ATGCTGATCGAGCTAAATCGATCGGACTACT" "ATGCTGATCGAGCTATATCGATCGGACTACT" 
#                                GT                                CT 
# "ATGCTGATCGAGCTAGATCGATCGGACTACT" "ATGCTGATCGAGCTACATCGATCGGACTACT" 

针对您的扩展示例进行了更新

# lookup table of replacements
lookup <- list(
A   =   'A' ,  
C   =   'C' ,  
G   =   'G' ,  
T   =   'T', 
W   =   c('A', 'T')  ,
S   =   c('C', 'G') , 
M   =   c('A', 'C') , 
K   =   c('G', 'T'), 
R   =   c('A', 'G') , 
Y   =   c('C', 'T') ,
B   =   c('C', 'G', 'T') ,
D   =   c('A', 'G', 'T') ,
H   =   c('A', 'C', 'T') ,
V   =   c('A', 'C', 'G') ,
N   =   c('A', 'C', 'G', 'T'))

# Get unique character that are in sequence
yourseq <- "ATGTTTGARCCACGYATHCCTAC" # example 1
uniq.char <- unique(strsplit(yourseq, "")[[1]])

# subset look up to only use characters found in sequence
# this will keep the exapnd.grid replacements more reasonable size
# find all combinations of these
# and paste together
ex <- do.call(expand.grid, lookup[uniq.char])
vec <- do.call(paste0, ex)

# Get all sequences
sapply(vec, chartr, old=paste(names(ex), collapse=""), x=yourseq)

答案 1 :(得分:1)

假设:

s1="ATGCTGATCGAGCTA"
s2="ATCGATCGGACTAC"
sep="**"
t1=c("A","T","G","C")
t2=c("C","T")

以下是您可以做的事情:

res=do.call(expand.grid, list(a=1:length(t1), b=1:length(t2)))
paste0(s1,sep,t1[res[,1]],sep,s2,sep,t2[res[,2]],sep)

<强>输出

[1] "ATGCTGATCGAGCTA**A**ATCGATCGGACTAC**C**" "ATGCTGATCGAGCTA**T**ATCGATCGGACTAC**C**"
[3] "ATGCTGATCGAGCTA**G**ATCGATCGGACTAC**C**" "ATGCTGATCGAGCTA**C**ATCGATCGGACTAC**C**"
[5] "ATGCTGATCGAGCTA**A**ATCGATCGGACTAC**T**" "ATGCTGATCGAGCTA**T**ATCGATCGGACTAC**T**"
[7] "ATGCTGATCGAGCTA**G**ATCGATCGGACTAC**T**" "ATGCTGATCGAGCTA**C**ATCGATCGGACTAC**T**"