根据匹配的类别将新列附加到数据表

时间:2015-11-26 10:01:42

标签: r

我有一张表如下

Ab1        Ab2         Ab3
Meronem    Eus         Biclar
Aug        Tazocin  
Aug        Pc
Aug        Eth         Amukin
Aug     
Tazocin     
Kefzol     Avelox      Meronem
Aug        Amukin      Tazocin
Kefzol     Tazocin  

我想根据与Ab类对应的Ab对数据表进行分类

Ab           Ab class
Aug          bl
Pentrexyl    Pcl
Zitromax     Mld
Azactam      Mb
Kefzol       Cp
Biclar       Mld
Eth          Mld
Pc           Pcl
Meronem      Cb
Tazocin      bl
Amukin       Am
Eus          Ts

我想以这种方式进入决赛桌。

Ab1      Ab2      Ab3      bl    Pcl    Mld    Mb    Cp    Cb    Am    Ts  
Meronem  Eus      Biclar   0     0      1      0     0     1     0     1
Aug      Tazocin           2     0      0      0     0     0     0     0
Aug      Pc                1     1      0      0     0     0     0     0
Aug      Eth      Amukin   1     0      1      0     0     0     1     0

我尝试为键分配值并匹配它们。但我无法找到解决办法。任何帮助,将不胜感激。

1 个答案:

答案 0 :(得分:2)

这是一个解决方案,它使用一些数据整形来获得正确格式的第二部分(AB类),然后简单地将它绑定到第一部分。

library(reshape2)
#ad a line ID
ab$line_ID <- 1:nrow(ab)

#turn to long format
ab_long <- melt(ab, id.var="line_ID", value.name="Ab")

#merge with ab-classdata (removing NA's for convenience)
ab_long_merge <- merge(ab_long[!is.na(ab_long$Ab),], ab_classes, by="Ab", all.x=T) 

#create our table (as a dataframe in right format using dcast)
ab_wide_merge <- dcast(line_ID~Abclass, data=ab_long_merge, fun.agg=length, value.var="Abclass")[,-1] #-1 to remove line

#create our desired output
output <- cbind(ab[,1:3], ab_wide_merge)

> output
      Ab1     Ab2     Ab3 Am bl Cb Cp Mld Pcl Ts NA
1 Meronem     Eus  Biclar  0  0  1  0   1   0  1  0
2     Aug Tazocin    <NA>  0  2  0  0   0   0  0  0
3     Aug      Pc    <NA>  0  1  0  0   0   1  0  0
4     Aug     Eth  Amukin  1  1  0  0   1   0  0  0
5     Aug    <NA>    <NA>  0  1  0  0   0   0  0  0
6 Tazocin    <NA>    <NA>  0  1  0  0   0   0  0  0
7  Kefzol  Avelox Meronem  0  0  1  1   0   0  0  1
8     Aug  Amukin Tazocin  1  2  0  0   0   0  0  0
9  Kefzol Tazocin    <NA>  0  1  0  1   0   0  0  0

使用的数据:

ab <- structure(list(Ab1 = c("Meronem", "Aug", "Aug", "Aug", "Aug", 
"Tazocin", "Kefzol", "Aug", "Kefzol"), Ab2 = c("Eus", "Tazocin", 
"Pc", "Eth", NA, NA, "Avelox", "Amukin", "Tazocin"), Ab3 = c("Biclar", 
NA, NA, "Amukin", NA, NA, "Meronem", "Tazocin", NA)), .Names = c("Ab1", 
"Ab2", "Ab3"), row.names = c(NA, -9L), class = "data.frame")

ab_classes <- structure(list(Ab = structure(c(2L, 10L, 12L, 3L, 7L, 4L, 5L, 
9L, 8L, 11L, 1L, 6L), .Label = c("Amukin", "Aug", "Azactam", 
"Biclar", "Eth", "Eus", "Kefzol", "Meronem", "Pc", "Pentrexyl", 
"Tazocin", "Zitromax"), class = "factor"), Abclass = structure(c(2L, 
7L, 6L, 5L, 4L, 6L, 6L, 7L, 3L, 2L, 1L, 8L), .Label = c("Am", 
"bl", "Cb", "Cp", "Mb", "Mld", "Pcl", "Ts"), class = "factor")), .Names = c("Ab", 
"Abclass"), class = "data.frame", row.names = c(NA, -12L))
#read in using read.table