用data.table中的连续序列替换零散的数字序列

时间:2018-10-29 20:32:47

标签: r

我有一个关于R和更改数字序列值的问题。我在data.table中确实有一列,看起来像X:

X <- data.table(id = c("103", "103", "103", "104", "104", "160", "160"), 
content = c("I", "don't", "know", "some", "more", "words", "."))

我想用顺序值替换id值以更改起点并消除两者之间的差距。在现实生活中,将有成千上万个id值,因此无法对它们进行grep-ing。

所以我想实现的目标是这样的:

Y <- data.table(id = c("0", "0", "0", "1", "1", "2", "2"), 
content = c("I", "don't", "know", "some", "more", "words", "."))

任何提示都将受到欢迎,因为我不知道如何开始。提前非常感谢您!

2 个答案:

答案 0 :(得分:0)

我们可以将'id'转换为factor,然后将其强制转换为integer

X[, id :=  as.character(as.integer(factor(id)) - 1)]

或使用match

X[, id := as.character(match(id, unique(id)) - 1)]

或者另一个选择是.GRP

X[, id :=  as.character(.GRP -1) , id]

identical(X, Y)
#[1] TRUE

或使用tidyverse

library(tidyverse)
X %>%
   mutate(id = as.character(match(id, unique(id)) - 1))

X %>% 
  mutate(id = as.character(group_indices(., id) - 1))

X %>% 
   mutate(id = as.character(cumsum(id != lag(id, default = first(id)))))

或带有base R

X$id <- as.character(match(df$id, unique(df$id) - 1)

答案 1 :(得分:0)

另一个选项是rleid

library(data.table)
X[, id := rleid(id) - 1L][]
#   id content
#1:  0       I
#2:  0   don't
#3:  0    know
#4:  1    some
#5:  1    more
#6:  2   words
#7:  2       .

如果您希望id属于字符类型,请这样做

X[, id := as.character(rleid(id) - 1L)]