用相同的元素识别不同的序列

时间:2019-05-31 18:32:53

标签: r

我想要一个不受相等值影响的序列向量。

ArrayList

期望的向量

group = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3 )

x = c("B","B",NA,"A","B","C","D", "A","A",NA,"A","A","A", "D","A","A","D","C","D")

dad = data.frame(group, x)

例如,在组out = c(1,1,NA,2,3,4,5, 1,1,NA,1,1,1, 1,2,2,3,4,5) dad = cbind(dad, out) 中,元素1再次出现,但是当序列发生更改时,它必须继续该序列。在这种情况下,"B"将是NA

1 个答案:

答案 0 :(得分:2)

带有data.table的选项。将'data.frame'转换为'data.table'(setDT(dad)),按'group'分组,用逻辑索引指定i以仅选择'x'为非NA的行,并获取“ x”的运行长度ID(rleid)分配为新列“ ind”

library(data.table)
setDT(dad)[!is.na(x),  ind := rleid(x), group]
dad
#    group    x ind
#1:     1    B   1
#2:     1    B   1
#3:     1 <NA>  NA
#4:     1    A   2
#5:     1    B   3
#6:     1    C   4
#7:     1    D   5
#8:     2    A   1
#9:     2    A   1
#10     2 <NA>  NA
#11:    2    A   1
#12:    2    A   1
#13:    2    A   1
#14:    3    D   1
#15:    3    A   2
#16:    3    A   2
#17:    3    D   3
#18:    3    C   4
#19:    3    D   5