Question

我想使用igraph创建一个共同创作网络。

我的数据以data.frame的形式组织，如下所示：

DF1 <- cbind(Papers =  paste('Paper', 1:5, sep = ''),
             Author1 = c('A', 'D', 'C', 'C', 'C'),
             Author2 = c('B', 'C', 'F', NA, 'F'),
             Author3 = c('C', 'E', NA, NA, 'D'))

我想创建一个边缘列表，如下所示：

   Vertex1 Vertex2
        A       B
        D       C
        C       F
        C       F
        A       C
        D       E
        C       D
        B       C
        C       E
        F       D

无论如何在R（例如igraph）

中执行此操作

以下功能可以解决问题，但对于大型数据集（超过5,000篇论文），运行时间太长

Fun_DFtoEdgeList <- function (Inputdataframe)
{

  ## This function create an edge list to create a network
  ## Input : Dataframe with UNIQUE VALUES !!!!

  ResEdgeList <- data.frame(Vertex1 = c('--'), Vertex2 = c('--'))


  for (i in 1 : (ncol(Inputdataframe)-1))
  {
    for (j in 2: (ncol(Inputdataframe)))
    {
      if (i !=j)     
      {
        #print(paste(i, j, sep ='--'))

        ToAppend <- data.frame(cbind(Inputdataframe[,i], Inputdataframe[,j]))
        names(ToAppend) <- names(ResEdgeList)
        #print(ToAppend)

        ResEdgeList <- rbind(ResEdgeList, ToAppend)
      }
    }

  }

  ResEdgeList <- data.frame(ResEdgeList[-1,], stringsAsFactors = FALSE)
  ResEdgeList<- subset(ResEdgeList, (is.na(Vertex1) == FALSE ) & (is.na(Vertex2) == FALSE ))  
  ResEdgeList
}


Fun_DFtoEdgeList (DF1[,-1])

`` 任何帮助赞赏。（我以前在不同的标题下发布了这个问题，但我被告知我不够清楚）

Answer 1

您的代码不会生成您提供的数据，因为它正在迭代“Paper”列。它也会被证明是缓慢的，因为每次你追加到现有的对象时，R都必须获取整个对象的另一个副本......当你迭代地执行此操作时，事情就会慢慢爬行。看看你的输出，我认为这就是你想要的：

#First, creat all combos of the columns you want. I don't think you want to include the "Paper" column?

x <- combn(2:4,2)
#-----
     [,1] [,2] [,3]
[1,]    2    2    3
[2,]    3    4    4

#next use apply to go through each pair:
apply(x, 2, function(z) data.frame(Vertex1 = DF1[, z[1]], Vertex2 = DF1[, z[2]]))
#-----
[[1]]
  Vertex1 Vertex2
1       A       B
2       D       C
3       C       F
4       C    <NA>
5       C       F
....
#So use do.call to rbind them together:

out <- do.call("rbind", 
        apply(x, 2, function(z) data.frame(Vertex1 = DF1[, z[1]], Vertex2 = DF1[, z[2]])))

#Finally, filter out the rows with NA:
out[complete.cases(out),]
#-----
   Vertex1 Vertex2
1        A       B
2        D       C
3        C       F
5        C       F
6        A       C
7        D       E
10       C       D
11       B       C
12       C       E
15       F       D

最后，看看它如何扩展到更大的问题：

#Just over a million papers
zz <- matrix(sample(letters, 1000002, TRUE), ncol = 3)
x <- combn(1:3, 2)
system.time(do.call("rbind", 
                    apply(x, 2, function(z) data.frame(Vertex1 = zz[, z[1]], Vertex2 = zz[, z[2]]))))
#-----
user  system elapsed 
  1.332   0.144   1.482

1.5秒对我来说似乎很合理？

Answer 2

可能有更好的方法，但尝试combn，它会产生所有独特的组合：

DF1 <- cbind(Papers =  paste('Paper', 1:5, sep = ''),
             Author1 = c('A', 'D', 'C', 'C', 'C'),
             Author2 = c('B', 'C', 'F', NA, 'F'),
             Author3 = c('C', 'E', NA, NA, 'D'))

require(igraph)
l=apply(DF1[,-1],MARGIN=1,function(x) na.omit(data.frame(t(combn(x,m=2)))))
df=do.call(rbind,l)
g=graph.data.frame(df,directed=F)
plot(g)

在r中创建一个共同创作网络

2 个答案: