我试图编写一个R函数,它将根据数据帧每行中原始字符串的长度对可变数量的5元素子串进行采样。我首先计算了我希望每次绘制重复的次数,并希望将其添加到函数中,以便每行采样的数量基于" num_draws"该行的列。我的想法是使用一个通用实例,然后使用函数外部的apply语句来操作每一行,但是我无法弄清楚如何设置函数来调用col 3作为一个通用实例(没有调用第一行的值或所有行的值)。
示例数据框:
BP TF num_draws
1 CGGCGCATGTTCGGTAATGA TFTTTFTTTFFTTFTTTTTF 6
2 ATAAGATGCCCAGAGCCTTTTCATGTACTA TFTFTFTFFFFFFTTFTTTTFTTTTFFTTT 9
3 TCTTAGGAAGGATTC FTTTTTTTTTFFFFF 4
期望的输出:
[1]GGCGC FTTTF
AATGA TTTTF
TTFFT TGTTC
TAATG TTTTT
AATGA TTTTF
CGGCG TFTTT
[2]AGATG FTFTF
ATAAG TFTFT
ATGCC FTFFF
GCCCA FFFFF
ATAAG TFTFT
GTACT TFFTT
GCCCA FFFFF
TGCCC TFFFF
AGATG FTFTF
[3]TTAGG TTTTT
CTTAG TTTTT
GGAAG TTTTT
GGATT TTFFF
#make example data frame
BaseP1 <- paste(sample(size = 20, x = c("A","C","T","G"), replace = TRUE), collapse = "")
BaseP2 <- paste(sample(size = 30, x = c("A","C","T","G"), replace = TRUE), collapse = "")
BaseP3 <- paste(sample(size = 15, x = c("A","C","T","G"), replace = TRUE), collapse = "")
TrueFalse1 <- paste(sample(size = 20, x = c("T","F"), replace = TRUE), collapse = "")
TrueFalse2 <- paste(sample(size = 30, x = c("T","F"), replace = TRUE), collapse = "")
TrueFalse3 <- paste(sample(size = 15, x = c("T","F"), replace = TRUE), collapse = "")
my_df <- data.frame(c(BaseP1,BaseP2,BaseP3), c(TrueFalse1, TrueFalse2, TrueFalse3))
#calculate number of draws by length
frag_length<- 5
my_df<- cbind(my_df, (round((nchar(my_df[,1]) / frag_length) * 1.5, digits = 0)))
colnames(my_df) <- c("BP", "TF", "num_draws")
#function to sample x number of draws per row (this does not work)
Fragment = function(string) {
nStart = sample(1:(nchar(string) -5), 1)
samp<- substr(string, nStart, nStart + 4)
replicate(n= string[,3], expr = samp)
}
apply(my_df[,1:2], c(1,2), Fragment)
答案 0 :(得分:2)
一种选择是将函数更改为另一个参数n
并在nStart
调用中创建replicate
Fragment = function(string, n) {
replicate(n= n, {nStart <- sample(1:(nchar(string) -5), 1)
samp <- substr(string, nStart, nStart + 4)
})
}
apply(my_df, 1, function(x) data.frame(lapply(x[1:2], Fragment, n = x[3])))
$`1`
# BP TF
#1 GGCGC FFTTF
#2 GGTAA TFFTT
#3 GCGCA TTFTT
#4 CGCAT TFFTT
#5 GGCGC FTTTF
#6 TGTTC FTTFT
#$`2`
# BP TF
#1 GTACT TTTTF
#2 ATAAG FTTFT
#3 GTACT TFTFF
#4 TAAGA TTTTF
#5 CCTTT FFTTF
#6 TCATG TTTTF
#7 CCAGA TFTFT
#8 TTCAT TFTFT
#9 CCCAG FTFTF
#$`3`
# BP TF
#1 AAGGA TTTFF
#2 AGGAT TTTTT
#3 CTTAG TFFFF
#4 TAGGA TTTFF