我在某个背景下进行采样或排除某些可能性时会遇到麻烦。
我正在尝试创建一个洗牌基因组区域的R函数。
目前该功能运作良好,并按照以下步骤操作:
这个函数使用GenomicRanges对象,这里是它的代码:
GrShuffle <- function(regions, chromSizes = LoadChromSizes("hg19")) {
# Gets all the regions lengths from the query.
regionsLength <- regions@ranges@width
# The possible starts are the chromosome sizes - the regions lengths.
possibleStarts <- chromSizes[as.vector(regions@seqnames), ] - regionsLength
# Gets all the random starts from sampling the possible starts.
randomStarts <- unlist(lapply(possibleStarts, sample.int, size = 1))
granges <- GRanges(regions@seqnames, IRanges(start = randomStarts,
width = regionsLength),
strand=regions@strand)
return(granges)
}
但现在我需要使用一个宇宙,即另一组区域,这些区域将决定将在哪个范围内发生。宇宙就像对采样的限制一样。它将是另一组区域,如查询。并且不应该在这些地区之外进行改组。
关于如何在R范围内进行采样的任何线索?
与使用循环相比,lapply很重要,因为它大大减少了函数的执行时间。
[编辑]
这是一个可重复的示例,它不使用GenomicRanges来最大限度地实现我想要实现的目标。
## GENERATES A RANDOM QUERY
chromSizes <- c(100,200,250)
names(chromSizes) <- c("1","2","3")
queryChrom <- sample(names(chromSizes), 100, replace = TRUE)
queryLengths <- sample(10, 100, replace = TRUE)
queryPossibleStarts <- chromSizes[queryChrom] - queryLengths
queryStarts <- unlist(lapply(queryPossibleStarts, sample.int, size = 1))
query <- data.frame(queryChrom, queryStarts, queryStarts + queryLengths)
colnames(query) <- c("chrom", "start", "end")
##
##SIMPLIFIED FUNCTION
# Gets all the regions lengths from the query.
regionsLength <- query$end - query$start
# The possible starts are the chromosome sizes - the regions lengths.
possibleStarts <- chromSizes[query$chrom] - regionsLength
# Gets all the random starts from sampling the possible starts.
randomStarts <- unlist(lapply(possibleStarts, sample.int, size = 1))
shuffledQuery <- data.frame(queryChrom, randomStarts, randomStarts + queryLengths)
colnames(shuffledQuery) <- c("chrom", "start", "end")
##