在r中创建一个共同创作网络

时间:2012-06-30 14:26:11

标签: r dataframe igraph

我想使用igraph创建一个共同创作网络。

我的数据以data.frame的形式组织,如下所示:

DF1 <- cbind(Papers =  paste('Paper', 1:5, sep = ''),
             Author1 = c('A', 'D', 'C', 'C', 'C'),
             Author2 = c('B', 'C', 'F', NA, 'F'),
             Author3 = c('C', 'E', NA, NA, 'D'))

我想创建一个边缘列表,如下所示:

   Vertex1 Vertex2
        A       B
        D       C
        C       F
        C       F
        A       C
        D       E
        C       D
        B       C
        C       E
        F       D

无论如何在R(例如igraph)

中执行此操作

以下功能可以解决问题,但对于大型数据集(超过5,000篇论文),运行时间太长

Fun_DFtoEdgeList <- function (Inputdataframe)
{

  ## This function create an edge list to create a network
  ## Input : Dataframe with UNIQUE VALUES !!!!

  ResEdgeList <- data.frame(Vertex1 = c('--'), Vertex2 = c('--'))


  for (i in 1 : (ncol(Inputdataframe)-1))
  {
    for (j in 2: (ncol(Inputdataframe)))
    {
      if (i !=j)     
      {
        #print(paste(i, j, sep ='--'))

        ToAppend <- data.frame(cbind(Inputdataframe[,i], Inputdataframe[,j]))
        names(ToAppend) <- names(ResEdgeList)
        #print(ToAppend)

        ResEdgeList <- rbind(ResEdgeList, ToAppend)
      }
    }

  }

  ResEdgeList <- data.frame(ResEdgeList[-1,], stringsAsFactors = FALSE)
  ResEdgeList<- subset(ResEdgeList, (is.na(Vertex1) == FALSE ) & (is.na(Vertex2) == FALSE ))  
  ResEdgeList
}


Fun_DFtoEdgeList (DF1[,-1])

`` 任何帮助赞赏。 (我以前在不同的标题下发布了这个问题,但我被告知我不够清楚)

2 个答案:

答案 0 :(得分:3)

您的代码不会生成您提供的数据,因为它正在迭代“Paper”列。它也会被证明是缓慢的,因为每次你追加到现有的对象时,R都必须获取整个对象的另一个副本......当你迭代地执行此操作时,事情就会慢慢爬行。看看你的输出,我认为这就是你想要的:

#First, creat all combos of the columns you want. I don't think you want to include the "Paper" column?

x <- combn(2:4,2)
#-----
     [,1] [,2] [,3]
[1,]    2    2    3
[2,]    3    4    4

#next use apply to go through each pair:
apply(x, 2, function(z) data.frame(Vertex1 = DF1[, z[1]], Vertex2 = DF1[, z[2]]))
#-----
[[1]]
  Vertex1 Vertex2
1       A       B
2       D       C
3       C       F
4       C    <NA>
5       C       F
....
#So use do.call to rbind them together:

out <- do.call("rbind", 
        apply(x, 2, function(z) data.frame(Vertex1 = DF1[, z[1]], Vertex2 = DF1[, z[2]])))

#Finally, filter out the rows with NA:
out[complete.cases(out),]
#-----
   Vertex1 Vertex2
1        A       B
2        D       C
3        C       F
5        C       F
6        A       C
7        D       E
10       C       D
11       B       C
12       C       E
15       F       D

最后,看看它如何扩展到更大的问题:

#Just over a million papers
zz <- matrix(sample(letters, 1000002, TRUE), ncol = 3)
x <- combn(1:3, 2)
system.time(do.call("rbind", 
                    apply(x, 2, function(z) data.frame(Vertex1 = zz[, z[1]], Vertex2 = zz[, z[2]]))))
#-----
user  system elapsed 
  1.332   0.144   1.482

1.5秒对我来说似乎很合理?

答案 1 :(得分:1)

可能有更好的方法,但尝试combn,它会产生所有独特的组合:

DF1 <- cbind(Papers =  paste('Paper', 1:5, sep = ''),
             Author1 = c('A', 'D', 'C', 'C', 'C'),
             Author2 = c('B', 'C', 'F', NA, 'F'),
             Author3 = c('C', 'E', NA, NA, 'D'))

require(igraph)
l=apply(DF1[,-1],MARGIN=1,function(x) na.omit(data.frame(t(combn(x,m=2)))))
df=do.call(rbind,l)
g=graph.data.frame(df,directed=F)
plot(g)