Question

我正试图在周期中排名等级值。从等级1到等级2是循环1，类似地从等级2到等级3是循环2，所以在第四个并且为每个循环创建二进制值（如下所示）

之前的数据框

id               event              date                   rank       
1241a21ef        one             2016-08-13 20:03:37         1
1241a21ef        two             2016-08-15 05:41:09         2
12426203b        two             2016-08-04 05:35:10         1
12426203b       three            2016-08-06 02:07:41         2
12426203b        two             2016-08-10 05:42:33         3
12426203b       three            2016-08-14 02:43:16         4

之后的数据框

id           cycle1     cycle2   cycle3
1241a21ef      1          0         0
12426203b      1          1         1

注意：每个组（即id）具有基于时间戳的唯一排名值，并且排名将重置为1以用于下一个新ID

Answer 1

您可以使用dplyr::count和tidyr::spread将所需格式的数据列表为：

library(dplyr)
library(tidyr)

df %>% group_by(id) %>%
  arrange(id, rank) %>%   
  filter(rank != last(rank)) %>%   #drop last rank for each id
  mutate(cycle = paste0("cycle", rank)) %>%  #desired column names after spread
  group_by(id, cycle) %>%
  count() %>%
  spread(key = cycle, value = n, fill = 0) %>%
  as.data.frame() 





#          id cycle1 cycle2 cycle3
# 1 1241a21ef      1      0      0
# 2 12426203b      1      1      1

数据：

df <- read.table(text = "id event date rank 1241a21ef one '2016-08-13 20:03:37' 1 1241a21ef two '2016-08-15 05:41:09' 2 12426203b two '2016-08-04 05:35:10' 1 12426203b three '2016-08-06 02:07:41' 2 12426203b two '2016-08-10 05:42:33' 3 12426203b three '2016-08-14 02:43:16' 4", header = TRUE, stringsAsFactors = FALSE)

Bucketing在r中对值进行排名

1 个答案: