验证id是否通过所有阶段

时间:2018-02-06 09:24:57

标签: r excel hawq

id  current stage  previous stages
1      06              05
1      06              03

2     04               03
2     04               02

假设有5个阶段的id。(02,03等) id应该遍历每个阶段。这里的示例Id num 1跳过04和02阶段,但id num 2遍历all.so它应该是当前阶段-1和-2等......

我必须确定跳过阶段的这类ID。需要做R或hadoop查询。

1 个答案:

答案 0 :(得分:1)

如果我正确理解了这个问题,那么您可以尝试以下dplyr解决方案。

library(dplyr)

df %>%
  group_by(id, current_stage) %>%
  summarise(all_prev_stages = paste(sort(previous_stages, decreasing = T), collapse = ",")) %>%
  mutate(posible_prev_stages = paste(seq(current_stage-1, 2), collapse = ",")) %>%
  filter(all_prev_stages != posible_prev_stages) %>%
  select(id)

这给出了跳过阶段的id列表(即样本数据中的id = 1):

     id
1     1

示例数据:

df <- structure(list(id = c(1L, 1L, 2L, 2L), current_stage = c(6L, 
6L, 4L, 4L), previous_stages = c(5L, 3L, 3L, 2L)), .Names = c("id", 
"current_stage", "previous_stages"), class = "data.frame", row.names = c(NA, 
-4L))