在数据帧中的组内按时间倒序编号生成(生成:0,-1,-2等)

时间:2019-05-09 09:24:02

标签: r dataframe

我正在使用一个数据框,该数据框包含不同的组,并且都具有多年的范围。像这样:

df <- data.frame(group = c(rep("aaa", 3), rep("bbb", 3), rep("ccc", 3)), year = c(2016:2018))
df  

   group  year  
1  aaa    2016  
2  aaa    2017
3  aaa    2018
4  bbb    2016
5  bbb    2017
6  bbb    2018
7  ccc    2016
8  ccc    2017
9  ccc    2018  

我想做的是创建一列(世代),该列基于年份分配一个值,其中最新一代是第0代,对于较早的世代则倒数。像这样:

   group  year  generation
1  aaa    2018  0
2  bbb    2018  0
3  ccc    2018  0
4  aaa    2017  -1
5  bbb    2017  -1
6  ccc    2017  -1 
7  aaa    2016  -2
8  bbb    2016  -2
9  ccc    2016  -2

我认为它必须类似于以下内容,但这给了我1到3而不是-2到0的范围:

df2 <- df %>% 
  group_by(group) %>% 
  arrange(desc(year)) %>% 
  mutate(generation = min_rank(year))
df2

   group  year  generation
1  aaa    2018  3
2  bbb    2018  3
3  ccc    2018  3
4  aaa    2017  2
5  bbb    2017  2
6  ccc    2017  2 
7  aaa    2016  1
8  bbb    2016  1
9  ccc    2016  1

任何想法如何达到我想要的范围? 谢谢!

3 个答案:

答案 0 :(得分:6)

如果year并不总是连续的,我们可以order year并将其从组中的总行数中减去。

library(dplyr)
df %>%
  group_by(group) %>%
  mutate(generation = -(n() - order(year))) %>%
  arrange(desc(year))

# group  year generation
#  <fct> <int>      <int>
#1 aaa    2018          0
#2 bbb    2018          0
#3 ccc    2018          0
#4 aaa    2017         -1
#5 bbb    2017         -1
#6 ccc    2017         -1
#7 aaa    2016         -2
#8 bbb    2016         -2
#9 ccc    2016         -2

使用基数R的

with(df, ave(year, group, FUN = function(x) -(length(x) - order(x))))

如果year连续,我们可以从组中的year年中减去max

df %>%
  group_by(group) %>%
  mutate(generation = year - max(year))

with(df, year - ave(year, group, FUN = max))

答案 1 :(得分:0)

使用transform

transform(df[order(-df$year), ], 
          generation=factor(year, labels=-(2:0)))
#   group year generation
# 3   aaa 2018          0
# 6   bbb 2018          0
# 9   ccc 2018          0
# 2   aaa 2017         -1
# 5   bbb 2017         -1
# 8   ccc 2017         -1
# 1   aaa 2016         -2
# 4   bbb 2016         -2
# 7   ccc 2016         -2

如果数据有些不同,例如bbb年的2017组失败

df2 <- df[-5, ]

我们可以将ave插入其中,以获得正确的世代计数。

transform(df2[order(-df2$year), ],
          generation=factor(
            with(df2, ave(as.numeric(group), year, FUN=seq)), 
            labels=-(0:2)))
#   group year generation
# 3   aaa 2018          0
# 6   bbb 2018          0
# 9   ccc 2018          0
# 2   aaa 2017         -1
# 8   ccc 2017         -1
# 1   aaa 2016         -2
# 4   bbb 2016         -1
# 7   ccc 2016         -2

数据

df <- structure(list(group = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L), .Label = c("aaa", "bbb", "ccc"), class = "factor"), 
    year = c(2016L, 2017L, 2018L, 2016L, 2017L, 2018L, 2016L, 
    2017L, 2018L)), class = "data.frame", row.names = c(NA, -9L
))

答案 2 :(得分:0)

带有data.table

的选项
library(data.table)
setDT(df)[, generation := year - max(year), group][order(- year)]
#    group year generation
#1:   aaa 2018          0
#2:   bbb 2018          0
#3:   ccc 2018          0
#4:   aaa 2017         -1
#5:   bbb 2017         -1
#6:   ccc 2017         -1
#7:   aaa 2016         -2
#8:   bbb 2016         -2
39:   ccc 2016         -2