Py2neo Neo4j批量提交错误

时间:2014-07-09 15:02:44

标签: python batch-file neo4j py2neo

我有一个json文件,其数据大约有140万个节点,我想为此构建一个Neo4j图形数据库。我试着使用py2neo的批量提交功能。我的代码如下:

# the variable words is a list containing node names
from py2neo import neo4j
batch = neo4j.WriteBatch(graph_db)
nodedict = {}
# I decided to use a dictionary because I would be creating relationships
# by referring to the dictionary entries later
for i in words:
    nodedict[i] = batch.create({"name":i})
results = batch.submit()

显示的错误如下:

Traceback (most recent call last):
  File "test.py", line 36, in <module>
    results = batch.submit()
  File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2116, in submit
    for response in self._submit()
  File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2085, in _submit
    for id_, request in enumerate(self.requests)
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 427, in _send
    return self._client().send(request)
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 364, in send
    return Response(request.graph_db, rs.status, request.uri, rs.getheader("Loc$
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 278, in __init__
    raise SystemError(body)
SystemError: None

任何人都可以告诉我这里究竟发生了什么?它与批量查询相当大的事实有什么关系吗?如果是这样,可以做些什么?提前致谢! :)

3 个答案:

答案 0 :(得分:3)

所以这就是我想到的(感谢这个问题:py2neo - Neo4j - System Error - Create Batch Nodes/Relationships):

py2neo批量提交功能在可以进行的查询方面有自己的局限性。虽然,我无法在上限获得确切数量,但我试图将每批次的查询数量限制为5000.所以我决定运行以下代码:

# the variable words is a list containing node names
from py2neo import neo4j
batch = neo4j.WriteBatch(graph_db)
nodedict = {}
# I decided to use a dictionary because I would be creating relationships
# by referring to the dictionary entries later

for index, i in enumerate(words):
    nodedict[i] = batch.create({"name":i})
    if index%5000 == 0:
        batch.submit()
        batch = neo4j.WriteBatch(graph_db) # As stated by Nigel below, I'm creating a new batch
batch.submit() #for the final batch

这样,我发送了批量请求(大小为5k的查询),并且成功地创建了我的整个图表!

答案 1 :(得分:1)

没有真正的方法可以描述批次可以包含的作业数量限制 - 它可能会因许多因素而有很大差异。一般来说,最好的选择是尝试为您的用例找到最佳尺寸并继续使用。看起来这就是你正在做的事情: - )

就您的解决方案而言,我建议进行一次调整。批处理对象的设计不是为了重复使用,而是在每次提交后清除批处理,只需创建一个新对象即可。无论如何,在下一版本的py2neo中将删除多次提交批次的能力。

答案 2 :(得分:0)

在我开始通过graph.create(* alist)开始使用批量创建后,我遇到了同样的问题。上面的答案指出了我正确的方向,我最终使用了https://gist.github.com/anonymous/6293739来自py2neo - Neo4j - System Error - Create Batch Nodes/RelationshipsSQL Fiddle启发的这个片段

chunk_size=500
chunks=(alist[pos:pos + chunk_size] for pos in xrange(0, len(alist), chunk_size))
for c in chunks:
    graph.create(*c)

PS py2neo == 2.0.7