我有一个包含一些初始值的现有data.frame。我想要做的是创建另一个data.frame,其中第一个data.frame中的每一行都有10个随机采样的行。我也试图以R方式做这个,所以我想避免迭代。
到目前为止,我已经设法将一个函数应用于表中生成一个值的每一行,但是我不知道如何将其扩展为每个应用程序生成10行,然后将结果重新绑定。
到目前为止,这是我的进展:
示例数据:
starts <- structure(list(instance = structure(21:26, .Label = c("big_1",
"big_10", "big_11", "big_12", "big_13", "big_14", "big_15", "big_16",
"big_17", "big_18", "big_19", "big_2", "big_20", "big_3", "big_4",
"big_5", "big_6", "big_7", "big_8", "big_9", "competition01",
"competition02", "competition03", "competition04", "competition05",
"competition06", "competition07", "competition08", "competition09",
"competition10", "competition11", "competition12", "competition13",
"competition14", "competition15", "competition16", "competition17",
"competition18", "competition19", "competition20", "med_1", "med_10",
"med_11", "med_12", "med_13", "med_14", "med_15", "med_16", "med_17",
"med_18", "med_19", "med_2", "med_20", "med_3", "med_4", "med_5",
"med_6", "med_7", "med_8", "med_9", "small_1", "small_10", "small_11",
"small_12", "small_13", "small_14", "small_15", "small_16", "small_17",
"small_18", "small_19", "small_2", "small_20", "small_3", "small_4",
"small_5", "small_6", "small_7", "small_8", "small_9"), class = "factor"),
event.clashes = c(674L, 626L, 604L, 1036L, 991L, 929L), overlaps = c(0L,
0L, 0L, 0L, 0L, 0L), room.valid = c(324L, 320L, 268L, 299L,
294L, 220L), final.timeslot = c(0L, 0L, 0L, 0L, 0L, 0L),
three.in.a.row = c(246L, 253L, 259L, 389L, 365L, 430L), single.event = c(97L,
120L, 97L, 191L, 150L, 138L)), .Names = c("instance", "event.clashes",
"overlaps", "room.valid", "final.timeslot", "three.in.a.row",
"single.event"), row.names = c(NA, 6L), class = "data.frame")
代码:
library(reshape)
m.starts <- melt(starts)
df <- data.frame()
gen.data <- function(x){
inst <- x[1]
constr <- x[2]
v <- as.integer(x[3])
val <- as.integer(rnorm(1, max(0, v), v / 2))
# Should probably return a data.frame here
print(paste(inst, constr, val))
}
apply(m.starts, 1, gen.data)
答案 0 :(得分:6)
我不清楚你在做什么,但你的gen_data函数的以下更改似乎可以做你想要的。具体来说,我不清楚你在使用val
做什么,因为这似乎只是生成一个随机数,该行的值列的平均值和该值的标准偏差除以2。那是你要的吗?我在您的函数中添加了一个新参数,以便考虑您要生成的行数:
gen.data <- function(x, nreps = 10){
inst <- x[1]
constr <- x[2]
v <- as.integer(x[3])
val <- as.integer(rnorm(nreps, max(0, v), v / 2))
out <- data.frame(inst = rep(inst, nreps)
, constr = rep(constr, nreps)
, val = val)
return(out)
}
然后使用:
do.call("rbind", apply(m.starts, 1, gen.data))
结果:
inst constr val
1 competition01 event.clashes 876
2 competition01 event.clashes 714
3 competition01 event.clashes 912
4 competition01 event.clashes -46
5 competition01 event.clashes 369
....
....
357 competition06 single.event 149
358 competition06 single.event 248
359 competition06 single.event 128
360 competition06 single.event 168
答案 1 :(得分:1)
无需apply
或rbind
。只需要一个简单的向量子集:
samples <- sample(1:nrow(starts), nrow(starts)*10, replace=TRUE)
starts[samples, 1:3]
前5行结果:
> head(starts[samples, 1:3], 5)
instance event.clashes overlaps
2 competition02 626 0
5 competition05 991 0
6 competition06 929 0
4 competition04 1036 0
2.1 competition02 626 0
答案 2 :(得分:0)
你可以将Andrie和Chase的解决方案的想法结合起来如下:
#Repeat each row ten times
start.m1 <- start.m[rep(1:nrow(start.m),each = 10),]
#Create extended vector to use to define
# means/sd
m <- rep(start.m$value,each = 10)
#Remove negative values;
# although none were in your data
m[m <= 0] <- 0
#Replace value with rnorm values
start.m1$value <- rnorm(nrow(start.m1), mean = m, sd = m / 2)
产生如下所示的内容:
> head(start.m1)
instance variable value
1 competition01 event.clashes 1098.0220
1.1 competition01 event.clashes 1208.4304
1.2 competition01 event.clashes 883.7976
1.3 competition01 event.clashes 365.1396
1.4 competition01 event.clashes 862.3113
1.5 competition01 event.clashes 1352.7085
我正在使用Andrie的建议来使用子集索引来扩展数据框,然后使用Chase对您的问题的解释,其中您似乎希望通过rnorm
实际生成值,而不是重新采样原始值行自己。这里的关键是rnorm
是矢量化的。