我想根据重叠类别对它们的优先级进行分类。当类别重叠时,将列重叠设置为1,然后根据优先级进行不同的分组以评估类别。
哪个是最佳选择?通过for循环遍历每个组以分别评估案例,或者有矢量化函数可以简化任务?
问题是我必须一次评估行,我看到的唯一方法是使用for循环,但是我想重用已经拥有的代码。
#testing the function
overlap <- c(0, 1, 1, 1, 0, 1,NA)
Priority<- c(1,2,4,1,5,6,7)
category<- c("a","b","c","d","e","f","g")
library(data.table)
data.dt <- data.table(overlap,Priority, category)
data.dt$overlap[nrow(data.dt)] = 0 #No overlap in the last, we don't know the next value
data.dt[, grp := cumsum(c(TRUE, diff(overlap) < 0))] #creation of a column "grp" which detects trailing edges in "overlap", defines the group to evaluate
i1 <- data.dt[, .I[!(data.table::shift(overlap, type = 'lead') == 0 & overlap == 0)], .(grp)]$V1 #shift the "overlap" values forward and compare them to 0, then compare the "overlap" values to 0.
i2 <- data.dt[, .I[overlap == 0]] #save the row index values where "overlap" is equal to 0
i3 <- setdiff(i1, i2) #remove the values where the "overlap" is 0
data.dt[i1, out := IEC_category[which.min(Priority)], .(grp)] #from the values of category where the value of "overlap" is 0, add
data.dt[i2, out := NA] #add NA where the "overlap" is 0, this is because there is no overlapping
i3 = na.omit(i3) #delete the last NA value of i3
v2 <- data.dt[i1, {v1 <- category[-which.min(Priority)];sapply(seq_along(v1),
function(i) toString(v1[seq_len(i)]))}, .(grp)]$V1
data.dt[i3, rest := v2][, grp := NULL][] #add the rest of the values where the "overlap" is no 0, these are the values with less Priority grouped in the previous line
我得到这个结果:
overlap Priority category out rest
1: 0 3 a <NA> <NA>
2: 1 2 b c a
3: 1 1 c c a, b
4: 1 4 d c a, b, d
5: 0 5 e <NA> <NA>
6: 1 6 f e f
7: 0 7 g <NA> <NA>
问题在于,在第二行中,当我仅与第一类别重叠时,我的程序正在显示第三行的类别。
我想要获得的结果如下:
overlap Priority category out rest
1: 0 3 a <NA> <NA>
2: 1 2 b b a
3: 1 1 c c a, b
4: 1 4 d c a, b, d
5: 0 5 e <NA> <NA>
6: 1 6 f e f
7: 0 7 g <NA> <NA>