我想使用igraph创建一个共同创作网络。
我的数据以data.frame的形式组织,如下所示:
DF1 <- cbind(Papers = paste('Paper', 1:5, sep = ''),
Author1 = c('A', 'D', 'C', 'C', 'C'),
Author2 = c('B', 'C', 'F', NA, 'F'),
Author3 = c('C', 'E', NA, NA, 'D'))
我想创建一个边缘列表,如下所示:
Vertex1 Vertex2
A B
D C
C F
C F
A C
D E
C D
B C
C E
F D
无论如何在R(例如igraph)
中执行此操作以下功能可以解决问题,但对于大型数据集(超过5,000篇论文),运行时间太长
Fun_DFtoEdgeList <- function (Inputdataframe)
{
## This function create an edge list to create a network
## Input : Dataframe with UNIQUE VALUES !!!!
ResEdgeList <- data.frame(Vertex1 = c('--'), Vertex2 = c('--'))
for (i in 1 : (ncol(Inputdataframe)-1))
{
for (j in 2: (ncol(Inputdataframe)))
{
if (i !=j)
{
#print(paste(i, j, sep ='--'))
ToAppend <- data.frame(cbind(Inputdataframe[,i], Inputdataframe[,j]))
names(ToAppend) <- names(ResEdgeList)
#print(ToAppend)
ResEdgeList <- rbind(ResEdgeList, ToAppend)
}
}
}
ResEdgeList <- data.frame(ResEdgeList[-1,], stringsAsFactors = FALSE)
ResEdgeList<- subset(ResEdgeList, (is.na(Vertex1) == FALSE ) & (is.na(Vertex2) == FALSE ))
ResEdgeList
}
Fun_DFtoEdgeList (DF1[,-1])
`` 任何帮助赞赏。 (我以前在不同的标题下发布了这个问题,但我被告知我不够清楚)
答案 0 :(得分:3)
您的代码不会生成您提供的数据,因为它正在迭代“Paper”列。它也会被证明是缓慢的,因为每次你追加到现有的对象时,R都必须获取整个对象的另一个副本......当你迭代地执行此操作时,事情就会慢慢爬行。看看你的输出,我认为这就是你想要的:
#First, creat all combos of the columns you want. I don't think you want to include the "Paper" column?
x <- combn(2:4,2)
#-----
[,1] [,2] [,3]
[1,] 2 2 3
[2,] 3 4 4
#next use apply to go through each pair:
apply(x, 2, function(z) data.frame(Vertex1 = DF1[, z[1]], Vertex2 = DF1[, z[2]]))
#-----
[[1]]
Vertex1 Vertex2
1 A B
2 D C
3 C F
4 C <NA>
5 C F
....
#So use do.call to rbind them together:
out <- do.call("rbind",
apply(x, 2, function(z) data.frame(Vertex1 = DF1[, z[1]], Vertex2 = DF1[, z[2]])))
#Finally, filter out the rows with NA:
out[complete.cases(out),]
#-----
Vertex1 Vertex2
1 A B
2 D C
3 C F
5 C F
6 A C
7 D E
10 C D
11 B C
12 C E
15 F D
最后,看看它如何扩展到更大的问题:
#Just over a million papers
zz <- matrix(sample(letters, 1000002, TRUE), ncol = 3)
x <- combn(1:3, 2)
system.time(do.call("rbind",
apply(x, 2, function(z) data.frame(Vertex1 = zz[, z[1]], Vertex2 = zz[, z[2]]))))
#-----
user system elapsed
1.332 0.144 1.482
1.5秒对我来说似乎很合理?
答案 1 :(得分:1)
可能有更好的方法,但尝试combn
,它会产生所有独特的组合:
DF1 <- cbind(Papers = paste('Paper', 1:5, sep = ''),
Author1 = c('A', 'D', 'C', 'C', 'C'),
Author2 = c('B', 'C', 'F', NA, 'F'),
Author3 = c('C', 'E', NA, NA, 'D'))
require(igraph)
l=apply(DF1[,-1],MARGIN=1,function(x) na.omit(data.frame(t(combn(x,m=2)))))
df=do.call(rbind,l)
g=graph.data.frame(df,directed=F)
plot(g)