我正在研究患者生病时的通讯线路。因此,例如:一个人生病然后去看医生(A),然后去医院(B),开始接触保险(C)等。每个病人的顺序是不同的。例如,一位患者将直接去医院,而另一位患者将首先检查保险等。我们在整个过程中一直跟踪患者,在与其他机构联系后,我们让他们填写另一份调查表。因此,在每个权限(“步骤”)之后,我们得到了调查的分数。这为我提供了以下数据集设置(实际上这是一个非常大的数据集):
Patient<-c(1,1,1,1,1,1,1,2,2,2,2)
sample6<-c("A","A","A","A","A","A","A","A","A","A","A")
sample5<-c("Stop","B","B","B","B","B","B","Stop","C","C","C")
sample4<-c(NA,"Stop","C","C","C","C","C",NA, "Stop","F","F")
sample3<-c(NA,NA,"Stop","D","D","D","D",NA, NA,"Stop","G")
sample2<-c(NA,NA,NA,"Stop","E","E","E",NA, NA,NA,"Stop")
sample1<-c(NA,NA,NA,NA, "Stop","F","F",NA,NA,NA, NA)
sample0<-c(NA,NA,NA,NA, NA,"Stop","G",NA,NA,NA, NA)
sample00<-c(NA,NA,NA,NA, NA,NA,"Stop",NA,NA,NA, NA)
Score<-c(90,88,65,44,78,98,66,38,93,88,80)
Time<-c("01-01-2018", "02-01-2018", "03-01-2018", "04-01-2018", "05-01-2018", "06-01-2018", "07-01-2018","01-02-2018", "02-02-2018", "05-02-2018", "06-02-2018")
df<-data.frame("Patient"=Patient, "step0"=sample6, "step1"=sample5, "step2"=sample4, "step3"=sample3, "step4"=sample2,
"step5"=sample1,"step6"= sample0, "step7"=sample00, "Score"=Score, "Time"=Time)
> df
Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time
1 1 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 90 01-01-2018
2 1 A B Stop <NA> <NA> <NA> <NA> <NA> 88 02-01-2018
3 1 A B C Stop <NA> <NA> <NA> <NA> 65 03-01-2018
4 1 A B C D Stop <NA> <NA> <NA> 44 04-01-2018
5 1 A B C D E Stop <NA> <NA> 78 05-01-2018
6 1 A B C D E F Stop <NA> 98 06-01-2018
7 1 A B C D E F G Stop 66 07-01-2018
8 2 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 38 01-02-2018
9 2 A C Stop <NA> <NA> <NA> <NA> <NA> 93 02-02-2018
10 2 A C F Stop <NA> <NA> <NA> <NA> 88 05-02-2018
11 2 A C F G Stop <NA> <NA> <NA> 80 06-02-2018
因此,例如:第1行具有权限A之后的调查得分,第2行是针对同一患者的,并且具有权限B之后的调查得分,依此类推。 现在,我想比较具有相同最终过程的列,我将以“ F”为例,但对于其他分析也可以是“ C”。因此,现在我想选择所有指示“ F”作为最终权限的行以及之前的行,以便可以对其进行比较。
所以我想创建这个数据集:
Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time Indicator
1 1 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 90 01-01-2018 0
2 1 A B Stop <NA> <NA> <NA> <NA> <NA> 88 02-01-2018 0
3 1 A B C Stop <NA> <NA> <NA> <NA> 65 03-01-2018 0
4 1 A B C D Stop <NA> <NA> <NA> 44 04-01-2018 0
5 1 A B C D E Stop <NA> <NA> 78 05-01-2018 Before
6 1 A B C D E F Stop <NA> 98 06-01-2018 After
7 1 A B C D E F G Stop 66 07-01-2018 0
8 2 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 38 01-02-2018 0
9 2 A C Stop <NA> <NA> <NA> <NA> <NA> 93 02-02-2018 Before
10 2 A C F Stop <NA> <NA> <NA> <NA> 88 05-02-2018 After
11 2 A C F G Stop <NA> <NA> <NA> 80 06-02-2018 0
我确实设法指出了包含“ F”加上前一行的行:
ProcessColumns <- 2:9
d <- df[,ProcessColumns] == "F"
df$Indicator <- rowSums(d,na.rm=T)
df$filter[which(df$filter %in% 1)-1] <- "Before"
df$filter[which(df$filter %in% 1)] <- "After"
但是现在它指示所有包含“ F”的行,而不仅仅是结尾。.有谁可以帮助我?
答案 0 :(得分:2)
我们可以做
df %>% mutate(sum=rowSums(!is.na(.[2:9]))) %>%
group_by(Patient) %>% mutate(max = sum-max(sum), Indicator = case_when(max == -2 ~ "Before", max == -1 ~ "After", TRUE ~ as.character(0)))
# A tibble: 11 x 14
# Groups: Patient [2]
Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time sum max Ind
<dbl> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <dbl> <fct> <dbl> <dbl> <chr>
1 1.00 A Stop NA NA NA NA NA NA 90.0 01-01-2018 2.00 -6.00 0
2 1.00 A B Stop NA NA NA NA NA 88.0 02-01-2018 3.00 -5.00 0
3 1.00 A B C Stop NA NA NA NA 65.0 03-01-2018 4.00 -4.00 0
4 1.00 A B C D Stop NA NA NA 44.0 04-01-2018 5.00 -3.00 0
5 1.00 A B C D E Stop NA NA 78.0 05-01-2018 6.00 -2.00 Before
6 1.00 A B C D E F Stop NA 98.0 06-01-2018 7.00 -1.00 After
7 1.00 A B C D E F G Stop 66.0 07-01-2018 8.00 0 0
8 2.00 A Stop NA NA NA NA NA NA 38.0 01-02-2018 2.00 -3.00 0
9 2.00 A C Stop NA NA NA NA NA 93.0 02-02-2018 3.00 -2.00 Before
10 2.00 A C F Stop NA NA NA NA 88.0 05-02-2018 4.00 -1.00 After
11 2.00 A C F G Stop NA NA NA 80.0 06-02-2018 5.00 0 0
更新:受@Andre Elrico的启发
df %>% unite(All, matches("step"), sep="", remove=F ) %>%
mutate(Ind = str_detect(All,"BStop"), Indicator = case_when( lead(Ind) == TRUE ~ "Before", Ind == TRUE ~ "After", TRUE ~ as.character(0))) %>%
select(-All,-Ind)
答案 1 :(得分:1)
或者您可以:
library(dplyr)
After_IND <- df %>% apply(.,1,paste,collapse="") %>% grepl("FStop",.)
Before_IND<- lead(After_IND,1,F)
df$Indicator <- 0
df$Indicator[After_IND]<-"After"
df$Indicator[Before_IND]<-"Before"
# Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time Indicator
# 1 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 90 01-01-2018 0
# 1 A B Stop <NA> <NA> <NA> <NA> <NA> 88 02-01-2018 0
# 1 A B C Stop <NA> <NA> <NA> <NA> 65 03-01-2018 0
# 1 A B C D Stop <NA> <NA> <NA> 44 04-01-2018 0
# 1 A B C D E Stop <NA> <NA> 78 05-01-2018 Before
# 1 A B C D E F Stop <NA> 98 06-01-2018 After
# 1 A B C D E F G Stop 66 07-01-2018 0
# 2 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 38 01-02-2018 0
# 2 A C Stop <NA> <NA> <NA> <NA> <NA> 93 02-02-2018 Before
# 2 A C F Stop <NA> <NA> <NA> <NA> 88 05-02-2018 After
# 2 A C F G Stop <NA> <NA> <NA> 80 06-02-2018 0
请注意:
如果要比较B例如您必须更改:
... %>% grepl("BStop",.)
答案 2 :(得分:0)
有很多行的tidyverse
,但通常可以使用。
library(tidyverse)
df %>%
rownames_to_column() %>%
gather(k,v,-Patient,-rowname,-Score, -Time) %>%
group_by(rowname) %>%
mutate(Indicator=ifelse(any(v %in%"F" ),"After",NA)) %>%
spread(k,v) %>%
arrange(as.numeric(rowname)) %>%
group_by(Patient) %>%
mutate(Indicator=ifelse(duplicated(Indicator), NA, Indicator)) %>%
mutate(Indicator2=ifelse(lead(Indicator) == "After", "Before", NA)) %>%
mutate(Indicator=ifelse(!is.na(Indicator2), Indicator2, Indicator)) %>%
select(Patient, starts_with("step"), Score, Time,Indicator, -Indicator2,-rowname) %>%
ungroup()
# A tibble: 11 x 12
Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time Indicator
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <fct> <chr>
1 1 A Stop NA NA NA NA NA NA 90 01-01-2018 NA
2 1 A B Stop NA NA NA NA NA 88 02-01-2018 NA
3 1 A B C Stop NA NA NA NA 65 03-01-2018 NA
4 1 A B C D Stop NA NA NA 44 04-01-2018 NA
5 1 A B C D E Stop NA NA 78 05-01-2018 Before
6 1 A B C D E F Stop NA 98 06-01-2018 After
7 1 A B C D E F G Stop 66 07-01-2018 NA
8 2 A Stop NA NA NA NA NA NA 38 01-02-2018 NA
9 2 A C Stop NA NA NA NA NA 93 02-02-2018 Before
10 2 A C F Stop NA NA NA NA 88 05-02-2018 After
11 2 A C F G Stop NA NA NA 80 06-02-2018 NA