我想在数据框中为每个组添加一个有序ID(按日期)。我可以使用dplyr(R - add column that counts sequentially within groups but repeats for duplicates)执行此操作:
Create index abc
Snapshot index abc
Delete index abc
-----
Create index abc (again)
Snapshot index abc
Delete index abc
因此,我的新专栏# Example data
date <- rep(c("2016-10-06 11:56:00","2016-10-05 11:56:00","2016-10-05 11:56:00","2016-10-07 11:56:00"),2)
date <- as.POSIXct(date)
group <- c(rep("A",4), rep("B",4))
df <- data.frame(group, date)
# dplyr - dense_rank
df2 <- df %>% group_by(group) %>%
mutate(m.test=dense_rank(date))
group date m.test
<fctr> <dttm> <int>
1 A 2016-10-06 11:56:00 2
2 A 2016-10-05 11:56:00 1
3 A 2016-10-05 11:56:00 1
4 A 2016-10-07 11:56:00 3
5 B 2016-10-06 11:56:00 2
6 B 2016-10-05 11:56:00 1
7 B 2016-10-05 11:56:00 1
8 B 2016-10-07 11:56:00 3
按m.test
对每个group
进行排名。如果我使用date
和rleid
,它似乎不起作用(05/10在06/10之后排名):
data.table
我的语法错误了吗?
答案 0 :(得分:1)
感谢@docendo discimus,使用data.table
执行此操作的正确方法是frank(..., ties.method = "dense")
:
df4 <- setDT(df)[, m.test := frank(date, ties.method = "dense"), by = group]