我的序列代码如下所示
A = A
C = C
G = G
T = T
W = A or T
S = C or G
M = A or C
K = G or T
R = A or G
Y = C or T
B = C or G or T
D = A or G or T
H = A or C or T
V = A or C or G
N = A or C or G or T
我想计算序列的组合。
以下是我想要获得的组合的几个例子:
Example 1 ATGTTTGARCCACGYATHCCTAC
Example 2 CAACGTCGTAATAAGGAAGTTTAG
Example 3 CAGGTTGAGTATYTWCAAATTAC
Example 4 AAACCATRATGCCATTATAATATTG
如果您能为我提供可能的代码,请务必提供帮助。
答案 0 :(得分:3)
您可以使用chartr
ro替换字符
#Your string
x <- "ATGCTGATCGAGCTANATCGATCGGACTACY"
# Get all combinations of replacement strings
# paste together for chartr function
ex <- do.call(paste0, expand.grid(N = c("A", "T", "G", "C"), Y = c("C", "T")))
# koop through each combination , replacing the string
sapply(ex, chartr, old="NY", x=x)
# AC TC
# "ATGCTGATCGAGCTAAATCGATCGGACTACC" "ATGCTGATCGAGCTATATCGATCGGACTACC"
# GC CC
# "ATGCTGATCGAGCTAGATCGATCGGACTACC" "ATGCTGATCGAGCTACATCGATCGGACTACC"
# AT TT
# "ATGCTGATCGAGCTAAATCGATCGGACTACT" "ATGCTGATCGAGCTATATCGATCGGACTACT"
# GT CT
# "ATGCTGATCGAGCTAGATCGATCGGACTACT" "ATGCTGATCGAGCTACATCGATCGGACTACT"
针对您的扩展示例进行了更新
# lookup table of replacements
lookup <- list(
A = 'A' ,
C = 'C' ,
G = 'G' ,
T = 'T',
W = c('A', 'T') ,
S = c('C', 'G') ,
M = c('A', 'C') ,
K = c('G', 'T'),
R = c('A', 'G') ,
Y = c('C', 'T') ,
B = c('C', 'G', 'T') ,
D = c('A', 'G', 'T') ,
H = c('A', 'C', 'T') ,
V = c('A', 'C', 'G') ,
N = c('A', 'C', 'G', 'T'))
# Get unique character that are in sequence
yourseq <- "ATGTTTGARCCACGYATHCCTAC" # example 1
uniq.char <- unique(strsplit(yourseq, "")[[1]])
# subset look up to only use characters found in sequence
# this will keep the exapnd.grid replacements more reasonable size
# find all combinations of these
# and paste together
ex <- do.call(expand.grid, lookup[uniq.char])
vec <- do.call(paste0, ex)
# Get all sequences
sapply(vec, chartr, old=paste(names(ex), collapse=""), x=yourseq)
答案 1 :(得分:1)
假设:
s1="ATGCTGATCGAGCTA"
s2="ATCGATCGGACTAC"
sep="**"
t1=c("A","T","G","C")
t2=c("C","T")
以下是您可以做的事情:
res=do.call(expand.grid, list(a=1:length(t1), b=1:length(t2)))
paste0(s1,sep,t1[res[,1]],sep,s2,sep,t2[res[,2]],sep)
<强>输出强>
[1] "ATGCTGATCGAGCTA**A**ATCGATCGGACTAC**C**" "ATGCTGATCGAGCTA**T**ATCGATCGGACTAC**C**"
[3] "ATGCTGATCGAGCTA**G**ATCGATCGGACTAC**C**" "ATGCTGATCGAGCTA**C**ATCGATCGGACTAC**C**"
[5] "ATGCTGATCGAGCTA**A**ATCGATCGGACTAC**T**" "ATGCTGATCGAGCTA**T**ATCGATCGGACTAC**T**"
[7] "ATGCTGATCGAGCTA**G**ATCGATCGGACTAC**T**" "ATGCTGATCGAGCTA**C**ATCGATCGGACTAC**T**"