我使用时隙,ID和多个变量处理数据框。
set.seed(1)
timeslot = c(rep(1,3),rep(2,3),rep(3,3),rep(4,3))
ID = c("A","B","C","D","B","C","C","B","C","A","B","D")
variable1 = c(as.integer(rnorm(12,2)**3)-1)
variable2 = c(as.integer(rnorm(12,4)**2)+1)
df = data.frame(timeslot,ID,variable1,variable2)
数据框:
timeslot ID variable1 variable2
1 1 A 1 12
2 1 B 9 4
3 1 C 0 27
4 2 D 45 16
5 2 B 11 16
6 2 C 0 25
7 3 C 14 24
8 3 B 19 22
9 3 C 16 25
10 4 A 3 23
11 4 B 42 17
12 4 D 12 5
我想对特定行执行一些计算。对于每一行,它查看特定ID(A,B,C或D)是否不仅存在于其自己的时隙中,还存在于前一时隙中。以下是通过以下代码添加的列(感谢 code_is_entropy :
df$min1 <- sapply(seq(nrow(df)), function(x)
{
if(df[x, "timeslot"] == 1){0} else {
max(df[x, "ID"] %in% df[df$timeslot == df[x,"timeslot"] - 1,"ID"])}
})
timeslot ID variable1 variable2 min1
1 1 A 1 12 0
2 1 B 9 4 0
3 1 C 0 27 0
4 2 D 45 16 0
5 2 B 11 16 1
6 2 C 0 25 1
7 3 C 14 24 1
8 3 B 19 22 1
9 3 C 16 25 1
10 4 A 3 23 0
11 4 B 42 17 1
12 4 D 12 5 0
对于这些行,应进行计算。其他行可以具有值0.我想包括几个不同的函数(也是'自制')。这些函数应该遍历所有变量(实际上超过2个)。
我使用此代码成功完成了函数sum,max,count&amp; count_not_zero:
sumprev = c()
maxprev = c()
lengthprev = c()
lengthnotzeroprev = c()
j=0
for (j in 1:(length(df$timeslot))){
if (df[j,"min1"] == 0){sumprev = c(sumprev,0); maxprev = c(maxprev,0); lengthnotzeroprev = c(lengthnotzeroprev,0); lengthprev = c(lengthprev,0)} else {
sumprev = c(sumprev,(sum(df[which((df[,"timeslot"] == df[j,"timeslot"] | df[,"timeslot"] == (df[j,"timeslot"]-1)) & df[,"ID"]==(df[j,"ID"])),"variable1"])))
maxprev = c(maxprev,(max(df[which((df[,"timeslot"] == df[j,"timeslot"] | df[,"timeslot"] == (df[j,"timeslot"]-1)) & df[,"ID"]==(df[j,"ID"])),"variable1"])))
lengthprev = c(lengthprev,(length(df[which((df[,"timeslot"] == df[j,"timeslot"] | df[,"timeslot"] == (df[j,"timeslot"]-1)) & df[,"ID"]==(df[j,"ID"])),"variable1"])))
lengthnotzeroprev = c(lengthnotzeroprev,(length(df[which((df[,"timeslot"] == df[j,"timeslot"] | df[,"timeslot"] == (df[j,"timeslot"]-1)) & df[,"ID"]==(df[j,"ID"])),"variable1"][df[which((df[,"timeslot"] == df[j,"timeslot"] | df[,"timeslot"] == (df[j,"timeslot"]-1)) & df[,"ID"]==(df[j,"ID"])),"variable1"]])))
}
}
df = cbind(df,sumprev,maxprev, lengthprev,lengthnotzeroprev)
timeslot ID variable1 variable2 min1 sumprev maxprev lengthprev lengthnotzeroprev
1 1 A 1 12 0 0 0 0 0
2 1 B 9 4 0 0 0 0 0
3 1 C 0 27 0 0 0 0 0
4 2 D 45 16 0 0 0 0 0
5 2 B 11 16 1 20 11 2 2
6 2 C 0 25 1 0 0 2 0
7 3 C 14 24 1 30 16 3 2
8 3 B 19 22 1 30 19 2 2
9 3 C 16 25 1 30 16 3 2
10 4 A 3 23 0 0 0 0 0
11 4 B 42 17 1 61 42 2 2
12 4 D 12 5 0 0 0 0 0
这是相当粗糙的,我无法弄清楚如何轻松地为多个函数和多个变量做到这一点。
我努力使用data.table包来获取一些东西,但我似乎无法弄清楚如何完成它。
很抱歉我做错了什么,我是R和StackOverflow的新手而不是母语人士。