Question

我正在研究使用R的Twitter数据。我所拥有的是.csv文件，其中包含365个用户的关注者ID。我正在尝试将此数据转换为边缘列表，最后使用igraph包导出.graphml或Pajek文件以进行网络分析。

.csv文件包含365行。我查找每个用户一行。每行的第一列包含用户ID，以下列包含用户朋友（他/她跟随的人）的ids od。我对这365个用户之间的网络感兴趣。为此，我必须过滤掉所有其他用户。

问题是R似乎打破了大于714列的所有行（没有任何错误消息）。当我读入csv文件（read.csv）时，数据框包含456行（它应该包含365行）和714列（由于一个用户有这么多朋友，它应该包含超过12500行。）

我没有找到关于R中列限制的任何信息。我发现所有这些都与内存限制有关。所以这是我的问题

有没有办法告诉R不要破坏列，或允许这么多列？

哪种格式化数据的最佳方式？我想这么多专栏不是最好的主意吗？

这是我的代码。它工作正常，但由于行数多于应有的数量，因此节点数也比应该更多

friends=read.csv("friends.csv",header=FALSE,check.names=FALSE,sep=";")


dim(friends)


#node list
i <- 1#zähler für zeilen
from <- NULL #
to <- NULL
while(i<=nrow(friends)){#startet bei der ersten Zeile in friends datei
   a <- as.array(intersect(friends[i,2:ncol(friends)], friends[,1]))
   for(l in a){
   from <- c(from,friends[i,1])
   to <- c(to,l)    
   }
    i=i+1
}
raWnet=data.frame(from,to)


#PRODUCE GRAPH
library(igraph)
el=as.matrix(raWnet) # coerces the data into a two-column matrix format that igraph likes
el[,1]=as.character(el[,1])
el[,2]=as.character(el[,2])
net=graph.edgelist(el,directed=TRUE)


write.graph(net, file="atpoltwit.graphml", format="graphml")
write.graph(net, file="atpoltwit.NET", format="pajek")

R在csv中打破超过714列的行

0 个答案: