Bucketing在r中对值进行排名

时间:2018-05-12 05:07:02

标签: r dplyr reshape data-transform

我正试图在周期中排名等级值。从等级1到等级2是循环1,类似地从等级2到等级3是循环2,所以在第四个并且为每个循环创建二进制值(如下所示)

之前的数据框
id               event              date                   rank       
1241a21ef        one             2016-08-13 20:03:37         1
1241a21ef        two             2016-08-15 05:41:09         2
12426203b        two             2016-08-04 05:35:10         1
12426203b       three            2016-08-06 02:07:41         2
12426203b        two             2016-08-10 05:42:33         3
12426203b       three            2016-08-14 02:43:16         4

之后的数据框
id           cycle1     cycle2   cycle3
1241a21ef      1          0         0
12426203b      1          1         1

注意:每个组(即id)具有基于时间戳的唯一排名值,并且排名将重置为1以用于下一个新ID

1 个答案:

答案 0 :(得分:1)

您可以使用dplyr::counttidyr::spread将所需格式的数据列表为:

library(dplyr)
library(tidyr)

df %>% group_by(id) %>%
  arrange(id, rank) %>%   
  filter(rank != last(rank)) %>%   #drop last rank for each id
  mutate(cycle = paste0("cycle", rank)) %>%  #desired column names after spread
  group_by(id, cycle) %>%
  count() %>%
  spread(key = cycle, value = n, fill = 0) %>%
  as.data.frame() 





#          id cycle1 cycle2 cycle3
# 1 1241a21ef      1      0      0
# 2 12426203b      1      1      1

数据:

df <- read.table(text =
"id               event              date                   rank       
1241a21ef        one             '2016-08-13 20:03:37'         1
1241a21ef        two             '2016-08-15 05:41:09'         2
12426203b        two             '2016-08-04 05:35:10'         1
12426203b       three            '2016-08-06 02:07:41'         2
12426203b        two             '2016-08-10 05:42:33'         3
12426203b       three            '2016-08-14 02:43:16'         4",
header = TRUE, stringsAsFactors = FALSE)