以下是样本数据集。
id<-c(1,2,3,4)
start<-c("Jul 2001","Jun 2001","May 2001","May 2001")
end<-c("Aug 2001","Sep 2001","Jul 2001","Nov 2001")
X1 <- runif(n=4, min=1, max=10)
X2 <- runif(n=4, min=1, max=10)
X3 <- runif(n=4, min=1, max=10)
X4 <- runif(n=4, min=1, max=10)
X5 <- runif(n=4, min=1, max=10)
X6 <- runif(n=4, min=1, max=10)
X7 <- runif(n=4, min=1, max=10)
X8 <- runif(n=4, min=1, max=10)
X9 <- runif(n=4, min=1, max=10)
X10 <- runif(n=4, min=1, max=10)
X11 <- runif(n=4, min=1, max=10)
X12 <- runif(n=4, min=1, max=10)
df <- data.frame(id,start,end,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12)
colnames(df)<-c("id","start","end","Jan 2001","Feb 2001","Mar 2001","Apr 2001","May 2001","Jun 2001",
"Jul 2001","Aug 2001","Sep 2001","Oct 2001","Nov 2001","Dec 2001")
df
id start end Jan 2001 Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 Jul 2001
1 1 Jul 2001 Aug 2001 6.384065 2.537499 6.562912 2.423018 6.908553 7.287870 7.089380
2 2 Jun 2001 Sep 2001 8.594478 2.824641 8.430340 8.508628 2.806191 6.989283 7.375734
3 3 May 2001 Jul 2001 1.657620 2.548688 4.172271 8.448615 8.426294 8.832702 8.294754
4 4 May 2001 Nov 2001 5.176202 4.827898 7.044409 9.117314 2.053103 2.610455 2.601701
Aug 2001 Sep 2001 Oct 2001 Nov 2001 Dec 2001
1 7.393482 1.865180 5.316736 6.737959 8.783017
2 7.816893 4.021888 7.086448 1.728219 1.553020
3 5.443161 7.489278 9.848638 7.072435 1.294177
4 8.853365 8.899155 5.768139 1.414094 2.322848
我想计算每个id的列平均值,从相应的开始到结束月份(包括开始和结束)。 例如。
id start end average
2 Jun 2001 Sep 2001 average of Jun, Jul, Aug and Sep 2001
我的第一个想法是为每个月分配索引。因此,不需要处理年份数据格式。似乎更容易。
# generate index for month data
df.i <- df
df.i$start.i[df.i$start == "Jan 2001"] <- 1
df.i$start.i[df.i$start == "Feb 2001"] <- 2
df.i$start.i[df.i$start == "Mar 2001"] <- 3
df.i$start.i[df.i$start == "Apr 2001"] <- 4
df.i$start.i[df.i$start == "May 2001"] <- 5
df.i$start.i[df.i$start == "Jun 2001"] <- 6
df.i$start.i[df.i$start == "Jul 2001"] <- 7
df.i$start.i[df.i$start == "Aug 2001"] <- 8
df.i$start.i[df.i$start == "Sep 2001"] <- 9
df.i$start.i[df.i$start == "Oct 2001"] <- 10
df.i$start.i[df.i$start == "Nov 2001"] <- 11
df.i$start.i[df.i$start == "Dec 2001"] <- 12
df.i$end.i[df.i$end == "Jan 2001"] <- 1
df.i$end.i[df.i$end == "Feb 2001"] <- 2
df.i$end.i[df.i$end == "Mar 2001"] <- 3
df.i$end.i[df.i$end == "Apr 2001"] <- 4
df.i$end.i[df.i$end == "May 2001"] <- 5
df.i$end.i[df.i$end == "Jun 2001"] <- 6
df.i$end.i[df.i$end == "Jul 2001"] <- 7
df.i$end.i[df.i$end == "Aug 2001"] <- 8
df.i$end.i[df.i$end == "Sep 2001"] <- 9
df.i$end.i[df.i$end == "Oct 2001"] <- 10
df.i$end.i[df.i$end == "Nov 2001"] <- 11
df.i$end.i[df.i$end == "Dec 2001"] <- 12
colnames(df.i)<-c("id","start","end","1","2","3","4","5","6",
"7","8","9","10","11","12","start.i","end.i")
df.i
id start end 1 2 3 4 5 6 7
1 1 Jul 2001 Aug 2001 6.384065 2.537499 6.562912 2.423018 6.908553 7.287870 7.089380
2 2 Jun 2001 Sep 2001 8.594478 2.824641 8.430340 8.508628 2.806191 6.989283 7.375734
3 3 May 2001 Jul 2001 1.657620 2.548688 4.172271 8.448615 8.426294 8.832702 8.294754
4 4 May 2001 Nov 2001 5.176202 4.827898 7.044409 9.117314 2.053103 2.610455 2.601701
8 9 10 11 12 start.i end.i
1 7.393482 1.865180 5.316736 6.737959 8.783017 7 8
2 7.816893 4.021888 7.086448 1.728219 1.553020 6 9
3 5.443161 7.489278 9.848638 7.072435 1.294177 5 7
4 8.853365 8.899155 5.768139 1.414094 2.322848 5 11
谢谢。
答案 0 :(得分:0)
index.r<-1
for (index.r in 1:nrow(df.i)){
df.i$mean[index.r] <- apply(df.i[index.r,as.character(which(as.numeric(colnames(df.i[index.r, yearmonlist]))>=df.i$start.i[index.r]
& as.numeric(colnames(df.i[index.r, yearmonlist]))<=df.i$end.i[index.r]))], 1, mean)
}
这个似乎有效。
答案 1 :(得分:0)
您的数据,为可重复性设置种子。
id<-c(1,2,3,4)
start<-c("Jul 2001","Jun 2001","May 2001","May 2001")
end<-c("Aug 2001","Sep 2001","Jul 2001","Nov 2001")
set.seed(123)
df <- data.frame(id, start, end, matrix(runif(n=4*12, min=1, max=10), ncol=12))
df$start <- as.character(df$start)
df$end <- as.character(df$end)
colnames(df)<-c("id", "start", "end", paste(month.abb, 2001))
您可以尝试申请。这将&#34;循环&#34;通过每一行,按开始和结束的名称进行子集化。重要的是,开始&amp;结束名称必须与df的名称匹配。最后,平均值是在子集上计算的。
apply(df, 1, function(x, y) mean(as.numeric(x[which(y == x[2]):which(y == x[3])])), colnames(df))
[1] 5.251895 6.273809 5.537480 6.815905