我有一个数据集,我想将序列号分配给具有重复行的列,例如:
variable_1
x
x
y
y
x
x
x
z
z
z
我如何得到这样的结果:
variable_1 sequence
x 1
y 2
x 3
z 4
我尝试使用unique,但我会错过第二次出现x的序列号。
答案 0 :(得分:2)
使用dplyr
和data.table
的解决方案。
library(dplyr)
library(data.table)
df2 <- df %>%
mutate(sequence = rleid(variable_1)) %>%
distinct()
df2
# variable_1 sequence
# 1 x 1
# 2 y 2
# 3 x 3
# 4 z 4
数据
df <- read.table(text = "
variable_1
x
x
y
y
x
x
x
z
z
z
", header = TRUE, stringsAsFactors = FALSE)
答案 1 :(得分:1)
dplyr
解决方案:
library(dplyr)
df = read.table(text = "
variable_1
x
x
y
y
x
x
x
z
z
z
", header=T, stringsAsFactors=F)
df %>%
mutate(flag = if_else(variable_1 != lag(variable_1), 1, 0, missing = 1), # flag row when variable changes
sequence = cumsum(flag)) %>% # create a group using the flags
distinct(variable_1, sequence) # get unique values
# variable_1 sequence
# 1 x 1
# 2 y 2
# 3 x 3
# 4 z 4
答案 2 :(得分:1)
在基地R
v=(df$variable_1!=lag(df$variable_1))
v[is.na(v)]=T
df$sequence =cumsum(v)
df[!duplicated(df),]
variable_1 sequence
1 x 1
3 y 2
5 x 3
8 z 4