我有一个csv文件,如下所示:
"","people_id","commit_id"
"1",1,0
"2",1,117
"3",1,144
"4",1,278
…
Here's csv文件,如果你想看看它。它包含11735行,但有5923个独特的人ID。
有没有人知道如何将人ID与公共“commit_id”连接,并忽略commit_id 0,因为id 0不存在。
现在我已经这样做了:
# read the csv file
commitsNetwork <- read.csv("commits.csv", header=TRUE)
# use a subset for demo purpose
commitsNetwork <- commitsNetwork[c("people_id", "commit_id")]
#build edgelist(for commits)
C <- spMatrix(nrow = length(unique(commitsNetwork$people_id)),
ncol = length(unique(commitsNetwork$commit_id)),
i = as.numeric(factor(commitsNetwork$people_id)),
j = as.numeric(factor(commitsNetwork$commit_id)),
x = rep(1, length(as.numeric(commitsNetwork$people_id))) )
row.names(C) <- levels(factor(commitsNetwork$people_id))
colnames(C) <- levels(factor(commitsNetwork$commit_id))
adjC <- tcrossprod(C)
comG <- graph.adjacency(adjC, mode = "undirected", weighted = TRUE, diag = FALSE)
#write to pajek file
write.graph(comG, "comNetwork.net", format = "pajek")
此外,边缘来自第二列“commit_id”。如果两个顶点(人)都通过第6列的公共commit_id连接。
因此我不确定如何在R中使用此csv文件生成网络。
理想的输出应该是:
*顶点5923 1
2
3
4
...
*边
1 4 1
1 25 1
1 39 1
1 41 1
1 48 1
直到5923 ......
答案 0 :(得分:1)
也许你想要这样的东西:
library(igraph)
library(Matrix)
download.file("https://www.dropbox.com/s/q7sxfwjec97qzcy/people.csv?dl=1",
tf <- tempfile(fileext = ".csv"), mode = "wb")
people <- read.csv(tf)
A <- spMatrix(nrow = length(unique(people$people)),
ncol = length(unique(people$repository_id)),
i = as.numeric(factor(people$people)),
j = as.numeric(factor(people$repository_id)),
x = rep(1, length(as.numeric(people$people))) )
row.names(A) <- levels(factor(people$people))
colnames(A) <- levels(factor(people$repository_id))
adj <- tcrossprod(A)
g <- graph.adjacency(adj, mode = "undirected", weighted = TRUE, diag = FALSE)
另见here。