我正在使用数据框
df <- data.frame(a = c("gene1", "gene2", "gene3", ...),
b = c(10, 20, 30, ...))
我想创建一个由100列组成的新数据框,每列包含来自原始数据框的a
列的250个基因的不同随机选择。这是我到目前为止所尝试的内容:
data.frame(matrix(data = df[sample(nrow(df), 250), 1],
ncol = 100, nrow = 250))
但是,这会使用相同的随机抽样填充每列,而不是唯一的。
答案 0 :(得分:0)
你去,用10而不是100和5而不是250
df <- data.frame(a = paste0("gene",1:100),
b = seq(10,100,10))
random_samples <- replicate(10,df[sample(nrow(df), 5), 1])
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] "gene14" "gene100" "gene13" "gene5" "gene20" "gene68" "gene24" "gene57" "gene54" "gene44"
# [2,] "gene71" "gene67" "gene44" "gene25" "gene90" "gene45" "gene46" "gene69" "gene76" "gene3"
# [3,] "gene54" "gene34" "gene97" "gene67" "gene10" "gene50" "gene62" "gene54" "gene49" "gene58"
# [4,] "gene81" "gene18" "gene50" "gene60" "gene56" "gene7" "gene42" "gene82" "gene50" "gene51"
# [5,] "gene12" "gene71" "gene31" "gene19" "gene50" "gene2" "gene15" "gene95" "gene59" "gene23"
# with seeds
seeds <- 1:10
seeds %>% sapply(function(x){set.seed(x);df[sample(nrow(df), 5), 1]}) %>% as.data.frame %>% setNames(paste0("S",seeds))
# S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
# 1 gene27 gene19 gene17 gene59 gene21 gene61 gene99 gene47 gene23 gene51
# 2 gene37 gene70 gene80 gene1 gene68 gene93 gene40 gene21 gene3 gene31
# 3 gene57 gene57 gene38 gene29 gene90 gene26 gene12 gene79 gene21 gene42
# 4 gene89 gene17 gene32 gene27 gene28 gene37 gene7 gene64 gene98 gene68
# 5 gene20 gene91 gene58 gene79 gene11 gene78 gene24 gene31 gene43 gene9