我正在使用R,这是我的数据框
Died.At <- c(22,40,72,41)
Writer.At <- c(16, 18, 36, 36)
First.Name <- c("John", "John", "Walt", "Walt")
Second.Name <- c("Doe", "Poe", "Whitman", "Austen")
Sex <- c("MALE", "MALE", "MALE", "MALE")
writers_df <- data.frame(Died.At, Writer.At, First.Name, Second.Name, Sex)
我想根据名称添加一个名为id的新列,所以john和walt在这种情况下,我知道我可以通过
轻松完成此操作id<-c("1","1","2","2")
但是我有一个大的数据集来处理,之后,这个名字也不会再出现了,所以华尔街之后就不会再有john了,有人可以帮我这个吗
答案 0 :(得分:2)
我们可以尝试
library(data.table)
setDT(writers_df)[, id:= .GRP, First.Name]
或base R
选项
writers_df$id <- cumsum(!duplicated(writers_df$First.Name))
或使用dplyr
library(dplyr)
writers_df %>%
mutate(id = group_indices_(., .dots="First.Name"))