根据R中的另一个值组合行值

时间:2015-04-07 08:22:08

标签: r combinatorics

我需要进行网络可视化,我有数据,但格式不正确! R:

中的数据框中的数据如下所示
Title       Name
Article1    Johnson
Article1    Hansson
Article1    Michaels
Article2    Nielsson
Article2    Madsen
Article2    Shannon
Article2    Paddington

我想找到基于标题的名称组合 - 即合作作者,所以这种格式的输出

Source     Target      Title
Johnson    Hansson     Article1
Johnson    Michaels    Article1
Hansson    Michaels    Article1
Nielsson   Madsen      Article2
Nielsson   Shannon     Article2
Nielsson   Paddington  Article2
Madsen     Shannon     Article2
Madsen     Paddington  Article2
Shannon    Paddington  Article2

网络是无向的,因此源/目标只是列名称来说明。那么我怎么能在R中做到这一点?我确信有一种简单的方法,但我找不到它。

2 个答案:

答案 0 :(得分:4)

以下是使用data.table v >= 1.9.5和新tstrsplit函数

的可能解决方案
library(data.table) # v >= 1.9.5
setDT(df)[, setNames(tstrsplit(combn(Name, 2, toString, simplify = FALSE), ", "), 
                     c("Source", "Target")), 
          by = Title]
#       Title   Source     Target
# 1: Article1  Johnson    Hansson
# 2: Article1  Johnson   Michaels
# 3: Article1  Hansson   Michaels
# 4: Article2 Nielsson     Madsen
# 5: Article2 Nielsson    Shannon
# 6: Article2 Nielsson Paddington
# 7: Article2   Madsen    Shannon
# 8: Article2   Madsen Paddington
# 9: Article2  Shannon Paddington

答案 1 :(得分:2)

base R:

中试试这个
 combos<-tapply(df$Name,df$Title,function(x) t(combn(x,2)))
 cbind(setNames(as.data.frame(do.call(rbind,combos)),c("Source","Target")),Title=rep(names(combos),vapply(combos,nrow,1L)))

#    Source     Target    Title
#1  Johnson    Hansson Article1
#2  Johnson   Michaels Article1
#3  Hansson   Michaels Article1
#4 Nielsson     Madsen Article2
#5 Nielsson    Shannon Article2
#6 Nielsson Paddington Article2
#7   Madsen    Shannon Article2
#8   Madsen Paddington Article2
#9  Shannon Paddington Article2