我想提取var2的值,该值对应于每个建筑月组合中var1的最小值。这是我的(假)数据集:
head(mydata)
# building month var1 var2
#1 A 1 -26.96333 376.9633
#2 A 1 165.38759 317.3993
#3 A 1 47.46345 271.0137
#4 A 2 73.47784 294.8171
#5 A 2 107.80130 371.7668
#6 A 2 10.16384 308.7975
可重复的代码:
## create fake data set:
set.seed(142)
mydata1 = data.frame(building = rep(LETTERS[1:5],6),month = sort(rep(1:6,5)),var1=rnorm(30,50,35),var2 = runif(30,200,400))
mydata2 = data.frame(building = rep(LETTERS[1:5],6),month = sort(rep(1:6,5)),var1=rnorm(30,60,35),var2 = runif(30,150,400))
mydata3 = data.frame(building = rep(LETTERS[1:5],6),month = sort(rep(1:6,5)),var1=rnorm(30,40,35),var2 = runif(30,250,400))
mydata = rbind(mydata1,mydata2,mydata3)
mydata = mydata[ order(mydata[,"building"], mydata[,"month"]), ]
row.names(mydata) = 1:nrow(mydata)
## here is how I pull the minimum value of v1 for each building-month combination:
require(reshape2)
m1 = melt(mydata, id.var=1:2)
d1 = dcast(m1, building ~ month, function(x) min(max(x,0), na.rm=T),
subset = .(variable == "var1"))
这为每个建筑月组合提取了var1的最小值...
head(d1)
# building 1 2 3 4 5 6
#1 A 165.38759 107.80130 93.32816 73.23279 98.55546 107.58780
#2 B 92.08704 98.94959 57.79610 94.10530 80.86883 99.75983
#3 C 93.38284 100.13564 52.26178 62.37837 91.98839 97.44797
#4 D 82.43440 72.43868 66.83636 105.46263 133.02281 94.56457
#5 E 70.09756 61.44406 30.78444 68.24334 94.35605 61.60610
然而,我想要的是一个数据框,其设置与d1完全相同,而是显示var2的值,该值对应于var1
的最小值(如上面的d1
所示)。我的直觉告诉我它应该是which.min()
的变体,但是没有让它与dcast()
或ddply()
一起使用。任何帮助表示赞赏!
答案 0 :(得分:3)
这可能是一步到位,但我比plyr更熟悉reshape2,
dcast(ddply(mydata, .(building, month), summarize, value = var2[which.min(var1)]),
building ~ month)