我有一个关于R和更改数字序列值的问题。我在data.table中确实有一列,看起来像X:
X <- data.table(id = c("103", "103", "103", "104", "104", "160", "160"),
content = c("I", "don't", "know", "some", "more", "words", "."))
我想用顺序值替换id值以更改起点并消除两者之间的差距。在现实生活中,将有成千上万个id值,因此无法对它们进行grep-ing。
所以我想实现的目标是这样的:
Y <- data.table(id = c("0", "0", "0", "1", "1", "2", "2"),
content = c("I", "don't", "know", "some", "more", "words", "."))
任何提示都将受到欢迎,因为我不知道如何开始。提前非常感谢您!
答案 0 :(得分:0)
我们可以将'id'转换为factor
,然后将其强制转换为integer
X[, id := as.character(as.integer(factor(id)) - 1)]
或使用match
X[, id := as.character(match(id, unique(id)) - 1)]
或者另一个选择是.GRP
X[, id := as.character(.GRP -1) , id]
identical(X, Y)
#[1] TRUE
或使用tidyverse
library(tidyverse)
X %>%
mutate(id = as.character(match(id, unique(id)) - 1))
或
X %>%
mutate(id = as.character(group_indices(., id) - 1))
或
X %>%
mutate(id = as.character(cumsum(id != lag(id, default = first(id)))))
或带有base R
X$id <- as.character(match(df$id, unique(df$id) - 1)
答案 1 :(得分:0)
另一个选项是rleid
library(data.table)
X[, id := rleid(id) - 1L][]
# id content
#1: 0 I
#2: 0 don't
#3: 0 know
#4: 1 some
#5: 1 more
#6: 2 words
#7: 2 .
如果您希望id
属于字符类型,请这样做
X[, id := as.character(rleid(id) - 1L)]