在R中按簇分配id

时间:2016-10-04 02:58:30

标签: r

我有这样的矢量

var1=c("A","A","B"," "," ","C","A","","A")

如何创建指示它们是否相邻的id向量。像

id1=c(1,1,1,0,0,2,2,0,3)

所以我想为每个集群分配ID。有什么方法可以在R?

中做到这一点

2 个答案:

答案 0 :(得分:2)

以下是rle的一个选项。我们使用trimws删除前导/滞后空间,根据它是否为非空字符串转换为逻辑向量(nzchar)并获取运行长度编码(rle )。将{rl'的list中的“值”向量更改为序列,并使用values复制lengths

rl <- rle(nzchar(trimws(var1)))
rl$values[rl$values] <- seq_along(rl$values[rl$values])
rep(rl$values, rl$lengths)
#[1] 1 1 1 0 0 2 2 0 3

数据

var1=c("A","A","B"," "," ","C","A","","A")

答案 1 :(得分:2)

我们可以cumsum diff var1生成一个表示群集的序列,包括空字符串,然后用0替换空字符串位置:

replace(cumsum(c(T, diff(var1 != "") == 1)), var1 == "", 0) 

给出:

# [1] 1 1 1 0 0 2 2 0 3

for:

var1=c("A","A","B","","","C","A","","A")

这假设var1不以空字符串开头,为了将其概括为该情况,我们可以检查var1的第一个元素并使用条件作为初始值:

replace(cumsum(c(var1[1] != "", diff(var1 != "") == 1)), var1 == "", 0)

给出:

# [1] 0 1 1 1 0 0 2 2 0 3

有:

var1=c("", "A","A","B","","","C","A","","A")