是否可以按组迭代data.table?

时间:2019-06-19 10:45:01

标签: r data.table

我想根据重叠类别对它们的优先级进行分类。当类别重叠时,将列重叠设置为1,然后根据优先级进行不同的分组以评估类别。

哪个是最佳选择?通过for循环遍历每个组以分别评估案例,或者有矢量化函数可以简化任务?

问题是我必须一次评估行,我看到的唯一方法是使用for循环,但是我想重用已经拥有的代码。

#testing the function
overlap <- c(0, 1, 1, 1, 0, 1,NA)
Priority<- c(1,2,4,1,5,6,7)
category<- c("a","b","c","d","e","f","g")
library(data.table)

data.dt <- data.table(overlap,Priority, category)

data.dt$overlap[nrow(data.dt)] = 0 #No overlap in the last, we don't know the next value


data.dt[,  grp := cumsum(c(TRUE, diff(overlap) < 0))] #creation of a column "grp" which detects trailing edges in "overlap", defines the group to evaluate
i1 <- data.dt[, .I[!(data.table::shift(overlap, type = 'lead') == 0 & overlap == 0)], .(grp)]$V1 #shift the "overlap" values forward and compare them to 0, then compare the "overlap" values to 0. 
i2 <- data.dt[, .I[overlap == 0]] #save the row index values where "overlap" is equal to 0
i3 <- setdiff(i1, i2) #remove the values where the "overlap" is 0
data.dt[i1,  out := IEC_category[which.min(Priority)], .(grp)] #from the values of category where the value of "overlap" is 0, add
data.dt[i2, out := NA] #add NA where the "overlap" is 0, this is because there is no overlapping

i3 = na.omit(i3) #delete the last NA value of i3 

v2 <- data.dt[i1, {v1 <- category[-which.min(Priority)];sapply(seq_along(v1), 
                                                           function(i) toString(v1[seq_len(i)]))}, .(grp)]$V1
data.dt[i3, rest := v2][, grp := NULL][] #add the rest of the values where the "overlap" is no 0, these are the values with less Priority grouped in the previous line

我得到这个结果:

   overlap Priority     category  out    rest
1:       0        3            a <NA>    <NA>
2:       1        2            b    c       a
3:       1        1            c    c    a, b
4:       1        4            d    c a, b, d
5:       0        5            e <NA>    <NA>
6:       1        6            f    e       f
7:       0        7            g <NA>    <NA>

问题在于,在第二行中,当我仅与第一类别重叠时,我的程序正在显示第三行的类别。

我想要获得的结果如下:

   overlap Priority     category  out    rest
1:       0        3            a <NA>    <NA>
2:       1        2            b    b       a
3:       1        1            c    c    a, b
4:       1        4            d    c a, b, d
5:       0        5            e <NA>    <NA>
6:       1        6            f    e       f
7:       0        7            g <NA>    <NA>

0 个答案:

没有答案