Question

我使用时隙，ID和多个变量处理数据框。

set.seed(1)
timeslot = c(rep(1,3),rep(2,3),rep(3,3),rep(4,3))
ID = c("A","B","C","D","B","C","C","B","C","A","B","D")
variable1 = c(as.integer(rnorm(12,2)**3)-1)
variable2 = c(as.integer(rnorm(12,4)**2)+1)
df = data.frame(timeslot,ID,variable1,variable2)

数据框：

   timeslot ID variable1 variable2
1         1  A         1        12
2         1  B         9         4
3         1  C         0        27
4         2  D        45        16
5         2  B        11        16
6         2  C         0        25
7         3  C        14        24
8         3  B        19        22
9         3  C        16        25
10        4  A         3        23
11        4  B        42        17
12        4  D        12         5

我想对特定行执行一些计算。对于每一行，它查看特定ID（A，B，C或D）是否不仅存在于其自己的时隙中，还存在于前一时隙中。以下是通过以下代码添加的列（感谢 code_is_entropy ：

df$min1 <- sapply(seq(nrow(df)), function(x)
{
  if(df[x, "timeslot"] == 1){0} else {
    max(df[x, "ID"] %in% df[df$timeslot == df[x,"timeslot"] - 1,"ID"])}
})

   timeslot ID variable1 variable2 min1
1         1  A         1        12    0
2         1  B         9         4    0
3         1  C         0        27    0
4         2  D        45        16    0
5         2  B        11        16    1
6         2  C         0        25    1
7         3  C        14        24    1
8         3  B        19        22    1
9         3  C        16        25    1
10        4  A         3        23    0
11        4  B        42        17    1
12        4  D        12         5    0

对于这些行，应进行计算。其他行可以具有值0.我想包括几个不同的函数（也是'自制'）。这些函数应该遍历所有变量（实际上超过2个）。

我使用此代码成功完成了函数sum，max，count＆amp; count_not_zero：

sumprev = c()
maxprev = c()
lengthprev = c()
lengthnotzeroprev = c()
j=0
for (j in 1:(length(df$timeslot))){
  if (df[j,"min1"] == 0){sumprev = c(sumprev,0); maxprev = c(maxprev,0); lengthnotzeroprev = c(lengthnotzeroprev,0); lengthprev = c(lengthprev,0)} else {
    sumprev = c(sumprev,(sum(df[which((df[,"timeslot"] == df[j,"timeslot"] | df[,"timeslot"] == (df[j,"timeslot"]-1)) & df[,"ID"]==(df[j,"ID"])),"variable1"])))
    maxprev = c(maxprev,(max(df[which((df[,"timeslot"] == df[j,"timeslot"] | df[,"timeslot"] == (df[j,"timeslot"]-1)) & df[,"ID"]==(df[j,"ID"])),"variable1"])))
    lengthprev = c(lengthprev,(length(df[which((df[,"timeslot"] == df[j,"timeslot"] | df[,"timeslot"] == (df[j,"timeslot"]-1)) & df[,"ID"]==(df[j,"ID"])),"variable1"])))
    lengthnotzeroprev = c(lengthnotzeroprev,(length(df[which((df[,"timeslot"] == df[j,"timeslot"] | df[,"timeslot"] == (df[j,"timeslot"]-1)) & df[,"ID"]==(df[j,"ID"])),"variable1"][df[which((df[,"timeslot"] == df[j,"timeslot"] | df[,"timeslot"] == (df[j,"timeslot"]-1)) & df[,"ID"]==(df[j,"ID"])),"variable1"]])))
    }
}
df = cbind(df,sumprev,maxprev, lengthprev,lengthnotzeroprev)


   timeslot ID variable1 variable2 min1 sumprev maxprev lengthprev lengthnotzeroprev
1         1  A         1        12    0       0       0          0                 0
2         1  B         9         4    0       0       0          0                 0
3         1  C         0        27    0       0       0          0                 0
4         2  D        45        16    0       0       0          0                 0
5         2  B        11        16    1      20      11          2                 2
6         2  C         0        25    1       0       0          2                 0
7         3  C        14        24    1      30      16          3                 2
8         3  B        19        22    1      30      19          2                 2
9         3  C        16        25    1      30      16          3                 2
10        4  A         3        23    0       0       0          0                 0
11        4  B        42        17    1      61      42          2                 2
12        4  D        12         5    0       0       0          0                 0

这是相当粗糙的，我无法弄清楚如何轻松地为多个函数和多个变量做到这一点。

我努力使用data.table包来获取一些东西，但我似乎无法弄清楚如何完成它。

很抱歉我做错了什么，我是R和StackOverflow的新手而不是母语人士。

基于R中多个列的多个条件执行多个计算

0 个答案: