逗号分隔的字符串向量到矩阵

时间:2018-06-28 10:12:53

标签: r

一个小时以来,我一直在研究这个问题,感觉就像撞墙一样:我想将逗号分隔的字符串向量转换成矩阵。

我有一个像这样的向量

'ABC,DFGH,IJ'
'KLMN,OP,DFGH,QR'
'ST,ABC'

我想要一个像这样的矩阵

ABC DFGH IJ KLMN OP QR ST
1   1    1  0    0  0  0
0   1    0  1    1  1  0
1   0    0  0    0  0  1

样本数据:

myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')

也欢迎基于R的答案。对于某些更大的数据集,我可能再次需要此技巧。

3 个答案:

答案 0 :(得分:2)

另一种基础R解决方案:

> myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
> mv <- strsplit(myvec,",")
> u <- unique(unlist(mv))
> t(sapply(mv, function(x) u %in% x)*1)
# output without colnames
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    1    1    0    0    0    0
[2,]    0    1    0    1    1    1    0
[3,]    1    0    0    0    0    0    1
> r <- t(sapply(mv, function(x) u %in% x)*1)
# adding colnames 
> colnames(r) <- u
> r
     ABC DFGH IJ KLMN OP QR ST
[1,]   1    1  1    0  0  0  0
[2,]   0    1  0    1  1  1  0
[3,]   1    0  0    0  0  0  1

答案 1 :(得分:1)

library(tidyverse)

myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')

data.frame(myvec) %>%                # create a data frame
  mutate(id = row_number(),          # create row id (helpful in order to reshape)
         value = 1) %>%              # create value = 1 (helpful in order to reshape)
  separate_rows(myvec) %>%           # separate values (using the commas; automatically done by this function)
  spread(myvec, value, fill = 0) %>% # reshape dataset
  select(-id)                        # remove row id column

#   ABC DFGH IJ KLMN OP QR ST
# 1   1    1  1    0  0  0  0
# 2   0    1  0    1  1  1  0
# 3   1    0  0    0  0  0  1

答案 2 :(得分:1)

您可以尝试使用 BASE R:

数据:

myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')

解决方案:

unq <- unique(strsplit(paste0(myvec,collapse=","),",")[[1]])
sapply(unq, function(x)grepl(x,strsplit(myvec,","))+0)

输出

> sapply(unq, function(x)grepl(x,strsplit(myvec,","))+0)
     ABC DFGH IJ KLMN OP QR ST
[1,]   1    1  1    0  0  0  0
[2,]   0    1  0    1  1  1  0
[3,]   1    0  0    0  0  0  1