我正试图在周期中排名等级值。从等级1到等级2是循环1,类似地从等级2到等级3是循环2,所以在第四个并且为每个循环创建二进制值(如下所示)
id event date rank
1241a21ef one 2016-08-13 20:03:37 1
1241a21ef two 2016-08-15 05:41:09 2
12426203b two 2016-08-04 05:35:10 1
12426203b three 2016-08-06 02:07:41 2
12426203b two 2016-08-10 05:42:33 3
12426203b three 2016-08-14 02:43:16 4
id cycle1 cycle2 cycle3
1241a21ef 1 0 0
12426203b 1 1 1
注意:每个组(即id)具有基于时间戳的唯一排名值,并且排名将重置为1以用于下一个新ID
答案 0 :(得分:1)
您可以使用dplyr::count
和tidyr::spread
将所需格式的数据列表为:
library(dplyr)
library(tidyr)
df %>% group_by(id) %>%
arrange(id, rank) %>%
filter(rank != last(rank)) %>% #drop last rank for each id
mutate(cycle = paste0("cycle", rank)) %>% #desired column names after spread
group_by(id, cycle) %>%
count() %>%
spread(key = cycle, value = n, fill = 0) %>%
as.data.frame()
# id cycle1 cycle2 cycle3
# 1 1241a21ef 1 0 0
# 2 12426203b 1 1 1
数据:强>
df <- read.table(text =
"id event date rank
1241a21ef one '2016-08-13 20:03:37' 1
1241a21ef two '2016-08-15 05:41:09' 2
12426203b two '2016-08-04 05:35:10' 1
12426203b three '2016-08-06 02:07:41' 2
12426203b two '2016-08-10 05:42:33' 3
12426203b three '2016-08-14 02:43:16' 4",
header = TRUE, stringsAsFactors = FALSE)