一列的每个字符串与R中另一列的每个字符串的组合

时间:2017-03-31 15:15:16

标签: r

这是包含许多行的表,但为了简化问题...

dt1 <-data.frame(col1=c("C,Y,M","B,C,M,A"),col2=c("B,E,M","B,A,G"),col3=c("2","10"))

     col1          col2         col3
1    C,Y,M         B,E,M        2
2    B,C,M,A       B,F,G        10

所以我想做的是

1。每列的每个字符串都应该配对,但是如果有一个公共字符串忽略它,例如C与B,C与E而不是C与M,因为M在那里该行的两列和类似的Y与B,Y与E再次不与M。

2. 他们的对应值为col3

输出表

dt2 <- data.frame(col1 =c("C","C","Y","Y","C","C","M","M","A","A"),col2 = c("B","E","B","E","F","G","F","G","F","G"),col3=c("2","2","2","2","10","10","10","10","10","10"))

  col1      col2       col3
1    C         B          2  
2    C         E          2
3    Y         B          2
4    Y         E          2
5    C         F         10
6    C         G         10
7    M         F         10
8    M         G         10
9    A         F         10
10   A         G         10

1 个答案:

答案 0 :(得分:2)

也许您可以尝试这样的事情(请注意您的样本数据中存在错误....):

dt1 <- data.frame(col1 = c("C,Y,M","B,C,M,A"), 
                  col2 = c("B,E,M","B,F,G"), 
                  col3 = c("2","10"))

x <- lapply(dt1, function(x) strsplit(as.character(x), ",", TRUE))

myFun <- function(x, y, z) {
  drop <- intersect(x, y)
  expand.grid(x[!x %in% drop], y[!y %in% drop], z)
}

do.call(rbind, Map(myFun, x[[1]], x[[2]], x[[3]]))
#    Var1 Var2 Var3
# 1     C    B    2
# 2     Y    B    2
# 3     C    E    2
# 4     Y    E    2
# 5     C    F   10
# 6     M    F   10
# 7     A    F   10
# 8     C    G   10
# 9     M    G   10
# 10    A    G   10