我有这样的矢量
var1=c("A","A","B"," "," ","C","A","","A")
如何创建指示它们是否相邻的id向量。像
id1=c(1,1,1,0,0,2,2,0,3)
所以我想为每个集群分配ID。有什么方法可以在R?
中做到这一点答案 0 :(得分:2)
以下是rle
的一个选项。我们使用trimws
删除前导/滞后空间,根据它是否为非空字符串转换为逻辑向量(nzchar
)并获取运行长度编码(rle
)。将{rl'的list
中的“值”向量更改为序列,并使用values
复制lengths
rl <- rle(nzchar(trimws(var1)))
rl$values[rl$values] <- seq_along(rl$values[rl$values])
rep(rl$values, rl$lengths)
#[1] 1 1 1 0 0 2 2 0 3
var1=c("A","A","B"," "," ","C","A","","A")
答案 1 :(得分:2)
我们可以cumsum
diff
var1
生成一个表示群集的序列,包括空字符串,然后用0
替换空字符串位置:
replace(cumsum(c(T, diff(var1 != "") == 1)), var1 == "", 0)
给出:
# [1] 1 1 1 0 0 2 2 0 3
for:
var1=c("A","A","B","","","C","A","","A")
这假设var1
不以空字符串开头,为了将其概括为该情况,我们可以检查var1
的第一个元素并使用条件作为初始值:
replace(cumsum(c(var1[1] != "", diff(var1 != "") == 1)), var1 == "", 0)
给出:
# [1] 0 1 1 1 0 0 2 2 0 3
有:
var1=c("", "A","A","B","","","C","A","","A")