一个小时以来,我一直在研究这个问题,感觉就像撞墙一样:我想将逗号分隔的字符串向量转换成矩阵。
我有一个像这样的向量
'ABC,DFGH,IJ'
'KLMN,OP,DFGH,QR'
'ST,ABC'
我想要一个像这样的矩阵
ABC DFGH IJ KLMN OP QR ST
1 1 1 0 0 0 0
0 1 0 1 1 1 0
1 0 0 0 0 0 1
样本数据:
myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
也欢迎基于R的答案。对于某些更大的数据集,我可能再次需要此技巧。
答案 0 :(得分:2)
另一种基础R解决方案:
> myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
> mv <- strsplit(myvec,",")
> u <- unique(unlist(mv))
> t(sapply(mv, function(x) u %in% x)*1)
# output without colnames
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 1 0 0 0 0
[2,] 0 1 0 1 1 1 0
[3,] 1 0 0 0 0 0 1
> r <- t(sapply(mv, function(x) u %in% x)*1)
# adding colnames
> colnames(r) <- u
> r
ABC DFGH IJ KLMN OP QR ST
[1,] 1 1 1 0 0 0 0
[2,] 0 1 0 1 1 1 0
[3,] 1 0 0 0 0 0 1
答案 1 :(得分:1)
library(tidyverse)
myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
data.frame(myvec) %>% # create a data frame
mutate(id = row_number(), # create row id (helpful in order to reshape)
value = 1) %>% # create value = 1 (helpful in order to reshape)
separate_rows(myvec) %>% # separate values (using the commas; automatically done by this function)
spread(myvec, value, fill = 0) %>% # reshape dataset
select(-id) # remove row id column
# ABC DFGH IJ KLMN OP QR ST
# 1 1 1 1 0 0 0 0
# 2 0 1 0 1 1 1 0
# 3 1 0 0 0 0 0 1
答案 2 :(得分:1)
您可以尝试使用 BASE R:
数据:
myvec<-c('ABC,DFGH,IJ','KLMN,OP,DFGH,QR','ST,ABC')
解决方案:
unq <- unique(strsplit(paste0(myvec,collapse=","),",")[[1]])
sapply(unq, function(x)grepl(x,strsplit(myvec,","))+0)
输出:
> sapply(unq, function(x)grepl(x,strsplit(myvec,","))+0)
ABC DFGH IJ KLMN OP QR ST
[1,] 1 1 1 0 0 0 0
[2,] 0 1 0 1 1 1 0
[3,] 1 0 0 0 0 0 1