我正在尝试进行许多随机抽样试验,在这些抽样中,我每次都可能得不到所有东西。
现在,我的工作是
test <- sample(rownames(data), size=10000, replace=T, prob=data$refFraction)
并非每个rowname(data)
都表示在此,但我需要它用于下一步。
我想拥有它,所以每次sample
我都有相同的长度(和顺序)向量,这样我就可以将每个采样组合成一个矩阵(我也不确定如何做到最好 - 如何制作数千个测试向量并使用其中一个应用函数将它们合并?)
编辑:根据答案,我想出了这个:
trials <- function(fractions, kmers, times, ref_size) {
replicate(times, sample(kmers, size=ref_size, replace=T, prob=fractions), simplify=F)
}
result <- trials(data$refFraction, rownames(data), 100, 1000)
mat <- matrix(result, nrow=100)
但是我仍然只想要计算行中每个事物的次数,同时也没有计数,所以我最终得到一个偶数矩阵。
所需的结果如下:
"A" "B" "C"
Trial1 2 5 6
Trial2 3 7 12
Trial3 0 5 14
dput(头(数据)):
structure(list(refCount = c(3142L, 4102L, 1975L, 2009L, 2363L,
2437L), refFraction = c(0.00300290255094, 0.00392040301208, 0.00188756605287,
0.00192006086086, 0.00225838915591, 0.00232911314979), readCount = c(147L,
719L, 356L, 418L, 745L, 766L), readFraction = c(0.00029577107721,
0.00144666261574, 0.000716289139367, 0.000841036124312, 0.00149897586749,
0.00154122887852), foldChange = c(2.31774884958, 0.996935198459,
0.968959564031, 0.825477549838, 0.409869676355, 0.412907501432
), p_value = c(5.05923221341436e-321, 4.46023836252119e-170,
2.29230878162415e-77, 1.73499617494115e-59, 2.80547347576314e-15,
4.32620038741552e-16)), .Names = c("refCount", "refFraction",
"readCount", "readFraction", "foldChange", "p_value"), row.names = c("AAAAA",
"AAAAT", "AAAAG", "AAAAC", "AAATA", "AAATT"), class = "data.frame")
答案 0 :(得分:1)
目前还不清楚你要做什么,但似乎这可能会有所帮助。
replicate
非常适合重复采样。在这里,我创建了一个5行数据框d
,然后在十个单独的时间内对行名称进行采样。当以这种方式使用时,replicate
会产生矩阵,因此听起来您可能需要这种方法。
> d <- data.frame(x = 1:5, y = 6:10)
> replicate(10, sample(rownames(d)))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] "5" "1" "1" "3" "4" "1" "4" "5" "3" "1"
# [2,] "4" "5" "2" "2" "3" "5" "1" "2" "1" "2"
# [3,] "1" "4" "5" "5" "5" "4" "3" "3" "2" "3"
# [4,] "2" "3" "3" "1" "1" "2" "2" "4" "4" "5"
# [5,] "3" "2" "4" "4" "2" "3" "5" "1" "5" "4"
答案 1 :(得分:0)
这就是我最终做到的方式:
trial_fn <- function(counts) {
replicate(num_trials, sample(counts, size=trial_size, replace=F), simplify=F)
}
tableize <- function(x) {
tmp <- matrix(table(factor(x, levels=1:1024)))[,1]
tmp/sum(tmp)
}
counts <- vector()
for (i in 1:1024) {
counts <- c(counts, rep(i, times=data[i,]$readCount))
}
trials <- trial_fn(counts)
trial_table <- sapply(trials, tableize)
将factor
与levels
一起使用,然后在结果上使用table
就是原始问题的答案。