我正在使用一个数据框,该数据框包含不同的组,并且都具有多年的范围。像这样:
df <- data.frame(group = c(rep("aaa", 3), rep("bbb", 3), rep("ccc", 3)), year = c(2016:2018))
df
group year
1 aaa 2016
2 aaa 2017
3 aaa 2018
4 bbb 2016
5 bbb 2017
6 bbb 2018
7 ccc 2016
8 ccc 2017
9 ccc 2018
我想做的是创建一列(世代),该列基于年份分配一个值,其中最新一代是第0代,对于较早的世代则倒数。像这样:
group year generation
1 aaa 2018 0
2 bbb 2018 0
3 ccc 2018 0
4 aaa 2017 -1
5 bbb 2017 -1
6 ccc 2017 -1
7 aaa 2016 -2
8 bbb 2016 -2
9 ccc 2016 -2
我认为它必须类似于以下内容,但这给了我1到3而不是-2到0的范围:
df2 <- df %>%
group_by(group) %>%
arrange(desc(year)) %>%
mutate(generation = min_rank(year))
df2
group year generation
1 aaa 2018 3
2 bbb 2018 3
3 ccc 2018 3
4 aaa 2017 2
5 bbb 2017 2
6 ccc 2017 2
7 aaa 2016 1
8 bbb 2016 1
9 ccc 2016 1
任何想法如何达到我想要的范围? 谢谢!
答案 0 :(得分:6)
如果year
并不总是连续的,我们可以order
year
并将其从组中的总行数中减去。
library(dplyr)
df %>%
group_by(group) %>%
mutate(generation = -(n() - order(year))) %>%
arrange(desc(year))
# group year generation
# <fct> <int> <int>
#1 aaa 2018 0
#2 bbb 2018 0
#3 ccc 2018 0
#4 aaa 2017 -1
#5 bbb 2017 -1
#6 ccc 2017 -1
#7 aaa 2016 -2
#8 bbb 2016 -2
#9 ccc 2016 -2
使用基数R的
with(df, ave(year, group, FUN = function(x) -(length(x) - order(x))))
如果year
连续,我们可以从组中的year
年中减去max
。
df %>%
group_by(group) %>%
mutate(generation = year - max(year))
和
with(df, year - ave(year, group, FUN = max))
答案 1 :(得分:0)
使用transform
。
transform(df[order(-df$year), ],
generation=factor(year, labels=-(2:0)))
# group year generation
# 3 aaa 2018 0
# 6 bbb 2018 0
# 9 ccc 2018 0
# 2 aaa 2017 -1
# 5 bbb 2017 -1
# 8 ccc 2017 -1
# 1 aaa 2016 -2
# 4 bbb 2016 -2
# 7 ccc 2016 -2
如果数据有些不同,例如bbb
年的2017
组失败
df2 <- df[-5, ]
我们可以将ave
插入其中,以获得正确的世代计数。
transform(df2[order(-df2$year), ],
generation=factor(
with(df2, ave(as.numeric(group), year, FUN=seq)),
labels=-(0:2)))
# group year generation
# 3 aaa 2018 0
# 6 bbb 2018 0
# 9 ccc 2018 0
# 2 aaa 2017 -1
# 8 ccc 2017 -1
# 1 aaa 2016 -2
# 4 bbb 2016 -1
# 7 ccc 2016 -2
数据
df <- structure(list(group = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L), .Label = c("aaa", "bbb", "ccc"), class = "factor"),
year = c(2016L, 2017L, 2018L, 2016L, 2017L, 2018L, 2016L,
2017L, 2018L)), class = "data.frame", row.names = c(NA, -9L
))
答案 2 :(得分:0)
带有data.table
library(data.table)
setDT(df)[, generation := year - max(year), group][order(- year)]
# group year generation
#1: aaa 2018 0
#2: bbb 2018 0
#3: ccc 2018 0
#4: aaa 2017 -1
#5: bbb 2017 -1
#6: ccc 2017 -1
#7: aaa 2016 -2
#8: bbb 2016 -2
39: ccc 2016 -2