我在csv文件中的数据集如下所示:
X Colour Orange Red White Violet Black Yellow Blue
1 1 Orange, Red NA NA NA NA NA NA NA
2 2 Red NA NA NA NA NA NA NA
3 3 White, Black NA NA NA NA NA NA NA
4 4 Yellow NA NA NA NA NA NA NA
5 5 Blue, Orange, Violet NA NA NA NA NA NA NA
我尝试为每个发生的行列匹配添加0和1。预期的结果是:
Colour Orange Red White Violet Black Yellow Blue
1 Orange,Red 1 1 0 0 0 0 0
2 Red 0 1 0 0 0 0 0
3 White,Black 0 0 1 0 1 0 0
4 Yellow 0 0 0 0 0 1 0
5 Blue,Orange, 1 0 0 1 0 0 1
Violet
如何在R?
中实现这一目标答案 0 :(得分:6)
遍历列名称,并使用grepl
检查它们是否在模式中:
dat[-(1:2)] <- sapply( colnames(dat[-(1:2)]), grepl, x=dat$Colour ) + 0
# X Colour Orange Red White Violet Black Yellow Blue
#1 1 Orange, Red 1 1 0 0 0 0 0
#2 2 Red 0 1 0 0 0 0 0
#3 3 White, Black 0 0 1 0 1 0 0
#4 4 Yellow 0 0 0 0 0 1 0
#5 5 Blue, Orange, Violet 1 0 0 1 0 0 1
答案 1 :(得分:3)
不确定是否添加了NA列。即使没有任何标识符NA列,我们也可以使用strsplit
拆分“颜色”列,在列表输出上应用mtabulate
,如果需要,可以根据'dat'的列名重新排列输出
library(qdapTools)
cbind(dat[1:2], mtabulate(strsplit(dat$Colour, ', ')))[names(dat)]
# X Colour Orange Red White Violet Black Yellow Blue
#1 1 Orange, Red 1 1 0 0 0 0 0
#2 2 Red 0 1 0 0 0 0 0
#3 3 White, Black 0 0 1 0 1 0 0
#4 4 Yellow 0 0 0 0 0 1 0
#5 5 Blue, Orange, Violet 1 0 0 1 0 0 1
或类似方法是使用cSplit_e
splitstackshape
library(splitstackshape)
cSplit_e(dat[1:2], 'Colour', type='character', fill=0)