解析大图并获取内存错误

时间:2018-12-27 08:36:13

标签: python azure graph bigdata networkx

我正在尝试解析一个大图,但是它写为“内存错误”,我应该使用哪种Azure数据解决方案以及如何使用?

在计算机上运行时,我发布了以下代码

import networkx as nx


class GraphFromTxt:
    def __init__(self, text): # init from text file
        self.GraphStan = []
        file = open(text, "r")
        for line in file:
            self.GraphStan.append(line)

    def print_list(self):
        print(self.GraphStan)

    def length(self):
        print(self.GraphStan.__len__())

    def print_edges(self, G):
        print(G.edges())

    def parse(self):
        return nx.parse_edgelist(self.GraphStan, nodetype=int)


G_listed = GraphFromTxt("stan.txt")
G_listed.length()
G = G_listed.parse()

输出:

"C:\Users\Roy Greenberg\AppData\Local\Programs\Python\Python37-32\python.exe" "C:/Users/Roy Greenberg/PycharmProjects/Random-walks/Graph_from_txt.py"
7600595
Traceback (most recent call last):
  File "C:/Users/Roy Greenberg/PycharmProjects/Random-walks/Graph_from_txt.py", line 26, in <module>
    G = G_listed.parse()
  File "C:/Users/Roy Greenberg/PycharmProjects/Random-walks/Graph_from_txt.py", line 21, in parse
    return nx.parse_edgelist(self.GraphStan, nodetype=int)
  File "C:\Users\Roy Greenberg\AppData\Local\Programs\Python\Python37-32\lib\site-packages\networkx\readwrite\edgelist.py", line 296, in parse_edgelist
    G.add_edge(u, v, **edgedata)
  File "C:\Users\Roy Greenberg\AppData\Local\Programs\Python\Python37-32\lib\site-packages\networkx\classes\graph.py", line 900, in add_edge
    datadict = self._adj[u].get(v, self.edge_attr_dict_factory())
MemoryError

Process finished with exit code 1

1 个答案:

答案 0 :(得分:0)

仅根据您的错误信息,您似乎在Windows上使用了32位Python,这限制了您的Python进程仅获得2GB的最大内存才能在内存中构建networkx图。请参考SO线程Python 32-bit memory limits on 64bit windows来了解它。

因此,根据我的经验,我认为Memory Error问题意味着您在32位Python中进行的当前工作适用于分配更多的内存,但这将超过导致此问题的最大内存限制。

因此,假设您的本地计算机上有足够的内存,我的建议是使用64位Python再次运行您的脚本。或者,您可以考虑采用一种变通方法,即一次构建一个局部图形,然后将局部图形转储到磁盘中以解析其他图形,并将这些子图形链接起来,就像链接表一样供以后加载。