当值在R中的data.frame中以双向顺序出现时聚合

时间:2017-06-26 19:02:27

标签: r data.table

我有一个表格,当我的列都有( A到B方向)中的数据时,我可以使用聚合但我想知道有没有办法在值时聚合或使用dplyr在A列和B列中双向显示。例如,A列和B列中的值可以显示在以下方向( A到B或B到A )。

library(data.table)
exampleset <-data.table(ColumnA = c("A2","A1","A3","A3","A4","A5"),
               ColumnB = c("A1","A2","A4","A3","A3","A5"),
               Colorcode = c("red","green","blue","yellow","red","red"))

期望的输出:

output <- data.table(ColumnA =c("A1","A3","A3","A5"),
                     ColumnB=c("A2","A4","A3","A5"),
                     ColorcodeCount =c(2,2,1,1))

1 个答案:

答案 0 :(得分:0)

对于这个特定情况,最好的解决方案是,David Arenburg使用pmin/pmax

exampleset[, .(Colorcode = uniqueN(Colorcode)), by = .(ColumnA = do.call(pmin, list(ColumnA, ColumnB)), 
                                                       ColumnB = do.call(pmax, list(ColumnA, ColumnB)))]

但是,对于您可能希望按3列而不是2列进行排序的情况,这不是很普遍。

或者,我使用mapply(使用apply更新)的解决方案是:

您可以创建始终具有相同顺序的列(因此A1\A2将被视为与A2/A1相同),然后按这些无序列进行分组。类似的东西:

exampleset2 <- exampleset[,c("unorderA","unorderB") := data.frame(t(mapply(FUN = function(...) c(...)[order(c(...))], ColumnA, ColumnB, USE.NAMES = FALSE)))]
exampleset2[,list(ColorcodeCount = length(unique(Colorcode))), by = .(unorderA, unorderB)]

#   unorderA unorderB ColorcodeCount
#1:       A1       A2              2
#2:       A3       A4              2
#3:       A3       A3              1
#4:       A5       A5              1

另一方面,如果你想要想要在一次通话中完成所有操作,另一种方式是:

exampleset[,list(ColorcodeCount = length(unique(Colorcode))), 
           by = .(t(mapply(FUN = function(...) c(...)[order(c(...))], ColumnA, ColumnB, USE.NAMES = FALSE))[,1],
                  t(mapply(FUN = function(...) c(...)[order(c(...))], ColumnA, ColumnB, USE.NAMES = FALSE))[,2])]

#    t t.1 ColorcodeCount
#1: A1  A2              2
#2: A3  A4              2
#3: A3  A3              1
#4: A5  A5              1