R(dplyr):按ID计数一列中“ 1”之前的“ 0”个观测值的数目

时间:2018-07-19 02:33:43

标签: r count dplyr

我有一个包含两个变量的数据集:IDrepeatvisittimeperiodID代表就诊的个人,而referredvisit代表是否建议将该观察结果转诊。换句话说,referredvisit == 0意味着该人不会被推荐去其他诊所,而referredvisit == 1则代表被推荐转诊的患者。 timeperiod显示了个人进入的顺序。

我的数据集如下:

timeperiod <- 1:18
ID <- c("TOM", "TOM", "SALLY", "SALLY", "RICHIE", "TOM", "TOM", "SALLY", "RICHIE", "RICHIE", "RICHIE", "SALLY", "TOM", "TOM", "TOM", "RICHIE", "RICHIE", "RICHIE")
referredvisit <- c(0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0)
df <- cbind(timeperiod, ID, referredvisit)
df <- as.data.frame(df)

我的目标是针对每个referredvisit == 0,计算出"1"s之前的多少行,直到它到达列的开头(对于第一个0)或直到它通过ID命中另一个0(其余0)。我想创建一个存储此计数的列。我的数据集结果应如下所示:

df$result <- c(0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3, 0, 0, 1, 0, 2, 0)

我实际上是在尝试遵循此link,但由于该链接假定ID排序正确,因此它似乎无法工作。我当时在想dplyr可能会有所帮助,但似乎也无法弄清楚。非常感谢有人可以帮助我!

提前谢谢!

编辑:为获得更好的可视化效果,结果将如下所示。但这只是在我按ID手动对其排序之后。因为我的原始数据集将包含数千行,所以我很难手动对ID进行排序。
enter image description here

2 个答案:

答案 0 :(得分:4)

零的位置差减去1得出前一个的数目,并且count_ones对单个ID执行该计算,其中其自变量被假定为逻辑矢量,其中零位。然后使用ave将其应用于每个ID。不使用任何软件包。

count_ones <- function(is0) replace(is0, is0, diff(which(c(TRUE, is0))) - 1)    
transform(df, result = ave(referredvisit == 0, ID, FUN = count_ones))

给予:

   timeperiod     ID referredvisit result
1           1    TOM             0      0
2           2    TOM             1      0
3           3  SALLY             1      0
4           4  SALLY             1      0
5           5 RICHIE             0      0
6           6    TOM             1      0
7           7    TOM             0      2
8           8  SALLY             1      0
9           9 RICHIE             0      0
10         10 RICHIE             0      0
11         11 RICHIE             1      0
12         12  SALLY             0      3
13         13    TOM             0      0
14         14    TOM             1      0
15         15    TOM             0      1
16         16 RICHIE             1      0
17         17 RICHIE             0      2
18         18 RICHIE             0      0

答案 1 :(得分:1)

这是一种tidyverse方法,可重现您期望的result(在result2列中)

df %>%
    mutate(referredvisit = as.numeric(as.character(referredvisit))) %>%
    arrange(ID) %>%
    group_by(ID) %>%
    mutate(
        flag = c(0, diff(referredvisit) < 0),
        grp = cumsum(flag)) %>%
    group_by(ID, grp) %>%
    mutate(cms = cumsum(referredvisit)) %>%
    ungroup() %>%
    mutate(result2 = ifelse(flag == 1, lag(cms), 0)) %>%
    select(-cms, -grp, -flag)
## A tibble: 18 x 5
#   timeperiod ID     referredvisit result result2
#   <fct>      <fct>          <dbl>  <dbl>   <dbl>
# 1 5          RICHIE            0.     0.      0.
# 2 9          RICHIE            0.     0.      0.
# 3 10         RICHIE            0.     0.      0.
# 4 11         RICHIE            1.     0.      0.
# 5 16         RICHIE            1.     0.      0.
# 6 17         RICHIE            0.     2.      2.
# 7 18         RICHIE            0.     0.      0.
# 8 3          SALLY             1.     0.      0.
# 9 4          SALLY             1.     0.      0.
#10 8          SALLY             1.     0.      0.
#11 12         SALLY             0.     3.      3.
#12 1          TOM               0.     0.      0.
#13 2          TOM               1.     0.      0.
#14 6          TOM               1.     0.      0.
#15 7          TOM               0.     2.      2.
#16 13         TOM               0.     0.      0.
#17 14         TOM               1.     0.      0.
#18 15         TOM               0.     1.      1.

更新

要保持原始顺序,您可以

df %>%
    rowid_to_column("row") %>%
    mutate(referredvisit = as.numeric(as.character(referredvisit))) %>%
    arrange(ID) %>%
    group_by(ID) %>%
    mutate(
        flag = c(0, diff(referredvisit) < 0),
        grp = cumsum(flag)) %>%
    group_by(ID, grp) %>%
    mutate(cms = cumsum(referredvisit)) %>%
    ungroup() %>%
    mutate(result2 = ifelse(flag == 1, lag(cms), 0)) %>%
    arrange(row) %>%
    select(-cms, -grp, -flag, -row)
## A tibble: 18 x 5
#   timeperiod ID     referredvisit result result2
#   <fct>      <fct>          <dbl>  <dbl>   <dbl>
# 1 1          TOM               0.     0.      0.
# 2 2          TOM               1.     0.      0.
# 3 3          SALLY             1.     0.      0.
# 4 4          SALLY             1.     0.      0.
# 5 5          RICHIE            0.     0.      0.
# 6 6          TOM               1.     0.      0.
# 7 7          TOM               0.     2.      2.
# 8 8          SALLY             1.     0.      0.
# 9 9          RICHIE            0.     0.      0.
#10 10         RICHIE            0.     0.      0.
#11 11         RICHIE            1.     0.      0.
#12 12         SALLY             0.     3.      3.
#13 13         TOM               0.     0.      0.
#14 14         TOM               1.     0.      0.
#15 15         TOM               0.     1.      1.
#16 16         RICHIE            1.     0.      0.
#17 17         RICHIE            0.     2.      2.
#18 18         RICHIE            0.     0.      0.