我有一个包含PatientID及其疾病测试结果的数据集,如下所示:
Id Result
1 Strep A: Positive
2 Flu A: Negative, Flu B: Negative
3 Rsv: Positive, RsvA: Negative, RsvB: Positive
4 Strep A: Negative
5 Flu A: Negative, Flu B: Negative
6 Flu A: Negative, Flu B: Negative
7 Strep A: Positive
如何按以下方式拆分Result
列:
Id Result_Strep A Result_Flu A Result_Flu B Result_Rsv Result_RsvA Result_RsvB
1 Positive NA NA NA NA NA
2 NA Negative Negative NA NA NA
3 NA NA NA Positive Negative Positive
4 Negative NA NA NA NA NA
5 NA Negative Negative NA NA NA
6 NA Negative Negative NA NA NA
7 Positive NA NA NA NA NA
数据输入
structure(list(Id = 1:7, Result = c("Strep A: Positive", "Flu A: Negative, Flu B: Negative",
"Rsv: Positive, RsvA: Negative, RsvB: Positive", "Strep A: Negative",
"Flu A: Negative, Flu B: Negative", "Flu A: Negative, Flu B: Negative",
"Strep A: Positive")), row.names = c(NA, -7L), class = "data.frame")
答案 0 :(得分:2)
我们可以使用separate_rows
在,
处进行拆分,然后将separate
列分成两部分,并重塑为“宽”格式
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
separate_rows(Result, sep=", ") %>%
separate(Result, into = c("Result1", "Result2"), sep=":\\s*") %>%
mutate(Result1 = str_c("Result_", Result1)) %>%
# in case of duplicate elements uncomment the commented code below
#group_by(Result1) %>%
#mutate(rn = row_number()) %>%
#ungroup %>%
pivot_wider(names_from = Result1, values_from = Result2)# %>%
#select(-rn)
# A tibble: 7 x 7
# Id `Result_Strep A` `Result_Flu A` `Result_Flu B` Result_Rsv Result_RsvA Result_RsvB
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 Positive <NA> <NA> <NA> <NA> <NA>
#2 2 <NA> Negative Negative <NA> <NA> <NA>
#3 3 <NA> <NA> <NA> Positive Negative Positive
#4 4 Negative <NA> <NA> <NA> <NA> <NA>
#5 5 <NA> Negative Negative <NA> <NA> <NA>
#6 6 <NA> Negative Negative <NA> <NA> <NA>
#7 7 Positive <NA> <NA> <NA> <NA> <NA>
df1 <- structure(list(Id = 1:7, Result = c("Strep A: Positive",
"Flu A: Negative, Flu B: Negative",
"Rsv: Positive, RsvA: Negative, RsvB: Positive", "Strep A: Negative",
"Flu A: Negative, Flu B: Negative", "Flu A: Negative, Flu B: Negative",
"Strep A: Positive")), class = "data.frame", row.names = c(NA,
-7L))