R-is.element内部应用于带有逗号分隔值的列

时间:2018-09-14 00:24:39

标签: r

我有一个数据框,其中有一个名为疾病的列和一个疾病列表(疾病列的唯一值),像这样

Disease
--------------------------
Diabetes, Blood Pressure
Diabetes
Anemia
No
Blood Pressure,Anemia

我尝试使用如下所示的sapply函数:

xx<-sapply(my_data$Disease, function(x) is.element(toString(stri_split_fixed(x,","))[[1]],unlist(Disease_List))[[1]]  + 0)

输出

> xx
  0 1 1 0 0

正在考虑用逗号分隔的值是一个不在列表中并返回0的新值。

我需要这样的输出

Diabetes    Blood Pressure    Anemia    
1            1                 0
1            0                 0
0            0                 1
0            0                 0
0            1                 1

3 个答案:

答案 0 :(得分:1)

 cbind(my_data,+Vectorize(grepl)(disease_list,my_data['Disease']))
                   Disease Diabetes Blood Pressure Anemia No
1 Diabetes, Blood Pressure        1              1      0  0
2                 Diabetes        1              0      0  0
3                   Anemia        0              0      1  0
4                       No        0              0      0  1
5    Blood Pressure,Anemia        0              1      1  0

您也可以使用 +sapply(disease_list,grepl,my_data$Disease)

其中

my_data = read.table(col.names = 'Disease',
                     stringsAsFactors = FALSE,
                     strip.white = TRUE
                     sep = '|',
                     text = ' Diabetes, Blood Pressure
                                            Diabetes
                                               Anemia
                                                   No
                                Blood Pressure,Anemia')
 disease_list = unique(trimws(unlist(strsplit(as.character(my_data$Disease),','))))

答案 1 :(得分:1)

splitstackshape中的方法

library(splitstackshape)

cSplit_e(df, "Disease", sep = ",",mode = "binary", type = "character", fill = 0, drop = F)
                   Disease Disease_Anemia Disease_Blood Pressure Disease_Diabetes Disease_No
1 Diabetes, Blood Pressure              0                      1                1          0
2                 Diabetes              0                      0                1          0
3                   Anemia              1                      0                0          0
4                       No              0                      0                0          1
5    Blood Pressure,Anemia              1                      1                0          0

答案 2 :(得分:0)

我们可以使用mtabulate

library(qdapTools)
cbind(df, mtabulate(strsplit(df$Disease, ",\\s*")))
#                    Disease Anemia Blood Pressure Diabetes No
#1 Diabetes, Blood Pressure      0              1        1  0
#2                 Diabetes      0              0        1  0
#3                   Anemia      1              0        0  0
#4                       No      0              0        0  1
#5    Blood Pressure,Anemia      1              1        0  0

数据

df <- structure(list(Disease = c("Diabetes, Blood Pressure", "Diabetes", 
 "Anemia", "No", "Blood Pressure,Anemia")), row.names = c(NA, 
 -5L), class = "data.frame")