我有一个包含两个变量的数据集:ID
,repeatvisit
和timeperiod
。 ID
代表就诊的个人,而referredvisit
代表是否建议将该观察结果转诊。换句话说,referredvisit == 0
意味着该人不会被推荐去其他诊所,而referredvisit == 1
则代表被推荐转诊的患者。 timeperiod
显示了个人进入的顺序。
我的数据集如下:
timeperiod <- 1:18
ID <- c("TOM", "TOM", "SALLY", "SALLY", "RICHIE", "TOM", "TOM", "SALLY", "RICHIE", "RICHIE", "RICHIE", "SALLY", "TOM", "TOM", "TOM", "RICHIE", "RICHIE", "RICHIE")
referredvisit <- c(0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0)
df <- cbind(timeperiod, ID, referredvisit)
df <- as.data.frame(df)
我的目标是针对每个referredvisit == 0
,计算出"1"s
之前的多少行,直到它到达列的开头(对于第一个0)或直到它通过ID命中另一个0(其余0)。我想创建一个存储此计数的列。我的数据集结果应如下所示:
df$result <- c(0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 3, 0, 0, 1, 0, 2, 0)
我实际上是在尝试遵循此link,但由于该链接假定ID排序正确,因此它似乎无法工作。我当时在想dplyr
可能会有所帮助,但似乎也无法弄清楚。非常感谢有人可以帮助我!
提前谢谢!
编辑:为获得更好的可视化效果,结果将如下所示。但这只是在我按ID手动对其排序之后。因为我的原始数据集将包含数千行,所以我很难手动对ID进行排序。
答案 0 :(得分:4)
零的位置差减去1得出前一个的数目,并且count_ones
对单个ID
执行该计算,其中其自变量被假定为逻辑矢量,其中零位。然后使用ave
将其应用于每个ID
。不使用任何软件包。
count_ones <- function(is0) replace(is0, is0, diff(which(c(TRUE, is0))) - 1)
transform(df, result = ave(referredvisit == 0, ID, FUN = count_ones))
给予:
timeperiod ID referredvisit result
1 1 TOM 0 0
2 2 TOM 1 0
3 3 SALLY 1 0
4 4 SALLY 1 0
5 5 RICHIE 0 0
6 6 TOM 1 0
7 7 TOM 0 2
8 8 SALLY 1 0
9 9 RICHIE 0 0
10 10 RICHIE 0 0
11 11 RICHIE 1 0
12 12 SALLY 0 3
13 13 TOM 0 0
14 14 TOM 1 0
15 15 TOM 0 1
16 16 RICHIE 1 0
17 17 RICHIE 0 2
18 18 RICHIE 0 0
答案 1 :(得分:1)
这是一种tidyverse
方法,可重现您期望的result
(在result2
列中)
df %>%
mutate(referredvisit = as.numeric(as.character(referredvisit))) %>%
arrange(ID) %>%
group_by(ID) %>%
mutate(
flag = c(0, diff(referredvisit) < 0),
grp = cumsum(flag)) %>%
group_by(ID, grp) %>%
mutate(cms = cumsum(referredvisit)) %>%
ungroup() %>%
mutate(result2 = ifelse(flag == 1, lag(cms), 0)) %>%
select(-cms, -grp, -flag)
## A tibble: 18 x 5
# timeperiod ID referredvisit result result2
# <fct> <fct> <dbl> <dbl> <dbl>
# 1 5 RICHIE 0. 0. 0.
# 2 9 RICHIE 0. 0. 0.
# 3 10 RICHIE 0. 0. 0.
# 4 11 RICHIE 1. 0. 0.
# 5 16 RICHIE 1. 0. 0.
# 6 17 RICHIE 0. 2. 2.
# 7 18 RICHIE 0. 0. 0.
# 8 3 SALLY 1. 0. 0.
# 9 4 SALLY 1. 0. 0.
#10 8 SALLY 1. 0. 0.
#11 12 SALLY 0. 3. 3.
#12 1 TOM 0. 0. 0.
#13 2 TOM 1. 0. 0.
#14 6 TOM 1. 0. 0.
#15 7 TOM 0. 2. 2.
#16 13 TOM 0. 0. 0.
#17 14 TOM 1. 0. 0.
#18 15 TOM 0. 1. 1.
要保持原始顺序,您可以
df %>%
rowid_to_column("row") %>%
mutate(referredvisit = as.numeric(as.character(referredvisit))) %>%
arrange(ID) %>%
group_by(ID) %>%
mutate(
flag = c(0, diff(referredvisit) < 0),
grp = cumsum(flag)) %>%
group_by(ID, grp) %>%
mutate(cms = cumsum(referredvisit)) %>%
ungroup() %>%
mutate(result2 = ifelse(flag == 1, lag(cms), 0)) %>%
arrange(row) %>%
select(-cms, -grp, -flag, -row)
## A tibble: 18 x 5
# timeperiod ID referredvisit result result2
# <fct> <fct> <dbl> <dbl> <dbl>
# 1 1 TOM 0. 0. 0.
# 2 2 TOM 1. 0. 0.
# 3 3 SALLY 1. 0. 0.
# 4 4 SALLY 1. 0. 0.
# 5 5 RICHIE 0. 0. 0.
# 6 6 TOM 1. 0. 0.
# 7 7 TOM 0. 2. 2.
# 8 8 SALLY 1. 0. 0.
# 9 9 RICHIE 0. 0. 0.
#10 10 RICHIE 0. 0. 0.
#11 11 RICHIE 1. 0. 0.
#12 12 SALLY 0. 3. 3.
#13 13 TOM 0. 0. 0.
#14 14 TOM 1. 0. 0.
#15 15 TOM 0. 1. 1.
#16 16 RICHIE 1. 0. 0.
#17 17 RICHIE 0. 2. 2.
#18 18 RICHIE 0. 0. 0.