我有一个数据框,其中有一个名为疾病的列和一个疾病列表(疾病列的唯一值),像这样
Disease
--------------------------
Diabetes, Blood Pressure
Diabetes
Anemia
No
Blood Pressure,Anemia
我尝试使用如下所示的sapply函数:
xx<-sapply(my_data$Disease, function(x) is.element(toString(stri_split_fixed(x,","))[[1]],unlist(Disease_List))[[1]] + 0)
输出
> xx
0 1 1 0 0
正在考虑用逗号分隔的值是一个不在列表中并返回0的新值。
我需要这样的输出
Diabetes Blood Pressure Anemia
1 1 0
1 0 0
0 0 1
0 0 0
0 1 1
答案 0 :(得分:1)
cbind(my_data,+Vectorize(grepl)(disease_list,my_data['Disease']))
Disease Diabetes Blood Pressure Anemia No
1 Diabetes, Blood Pressure 1 1 0 0
2 Diabetes 1 0 0 0
3 Anemia 0 0 1 0
4 No 0 0 0 1
5 Blood Pressure,Anemia 0 1 1 0
您也可以使用
+sapply(disease_list,grepl,my_data$Disease)
其中
my_data = read.table(col.names = 'Disease',
stringsAsFactors = FALSE,
strip.white = TRUE
sep = '|',
text = ' Diabetes, Blood Pressure
Diabetes
Anemia
No
Blood Pressure,Anemia')
disease_list = unique(trimws(unlist(strsplit(as.character(my_data$Disease),','))))
答案 1 :(得分:1)
splitstackshape
中的方法
library(splitstackshape)
cSplit_e(df, "Disease", sep = ",",mode = "binary", type = "character", fill = 0, drop = F)
Disease Disease_Anemia Disease_Blood Pressure Disease_Diabetes Disease_No
1 Diabetes, Blood Pressure 0 1 1 0
2 Diabetes 0 0 1 0
3 Anemia 1 0 0 0
4 No 0 0 0 1
5 Blood Pressure,Anemia 1 1 0 0
答案 2 :(得分:0)
我们可以使用mtabulate
library(qdapTools)
cbind(df, mtabulate(strsplit(df$Disease, ",\\s*")))
# Disease Anemia Blood Pressure Diabetes No
#1 Diabetes, Blood Pressure 0 1 1 0
#2 Diabetes 0 0 1 0
#3 Anemia 1 0 0 0
#4 No 0 0 0 1
#5 Blood Pressure,Anemia 1 1 0 0
df <- structure(list(Disease = c("Diabetes, Blood Pressure", "Diabetes",
"Anemia", "No", "Blood Pressure,Anemia")), row.names = c(NA,
-5L), class = "data.frame")