我正在创建一个庞大的图形数据库,拥有超过140万个节点和1.6亿个关系。我的代码如下:
from py2neo import neo4j
# first we create all the nodes
batch = neo4j.WriteBatch(graph_db)
nodedata = []
for index, i in enumerate(words): # words is predefined
batch.create({"term":i})
if index%5000 == 0: #so as not to exceed the batch restrictions
results = batch.submit()
for x in results:
nodedata.append(x)
batch = neo4j.WriteBatch(graph_db)
results = batch.submit()
for x in results:
nodedata.append(x)
#nodedata contains all the node instances now
#time to create relationships
batch = neo4j.WriteBatch(graph_db)
for iindex, i in enumerate(weightdata): #weightdata is predefined
batch.create((nodedata[iindex], "rel", nodedata[-iindex], {"weight": i})) #there is a different way how I decide the indexes of nodedata, but just as an example I put iindex and -iindex
if iindex%5000 == 0: #again batch constraints
batch.submit() #this is the line that shows error
batch = neo4j.WriteBatch(graph_db)
batch.submit()
我收到以下错误:
Traceback (most recent call last):
File "test.py", line 53, in <module>
batch.submit()
File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2116, in submit
for response in self._submit()
File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2085, in _submit
for id_, request in enumerate(self.requests)
File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 427, in _send
return self._client().send(request)
File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 351, in send
rs = self._send_request(request.method, request.uri, request.body, request.$
File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 326, in _send_re$
data = json.dumps(data, separators=(",", ":"))
File "/usr/lib64/python2.6/json/__init__.py", line 237, in dumps
**kw).encode(obj)
File "/usr/lib64/python2.6/json/encoder.py", line 367, in encode
chunks = list(self.iterencode(o))
File "/usr/lib64/python2.6/json/encoder.py", line 306, in _iterencode
for chunk in self._iterencode_list(o, markers):
File "/usr/lib64/python2.6/json/encoder.py", line 204, in _iterencode_list
for chunk in self._iterencode(value, markers):
File "/usr/lib64/python2.6/json/encoder.py", line 309, in _iterencode
for chunk in self._iterencode_dict(o, markers):
File "/usr/lib64/python2.6/json/encoder.py", line 275, in _iterencode_dict
for chunk in self._iterencode(value, markers):
File "/usr/lib64/python2.6/json/encoder.py", line 317, in _iterencode
for chunk in self._iterencode_default(o, markers):
File "/usr/lib64/python2.6/json/encoder.py", line 323, in _iterencode_default
newobj = self.default(o)
File "/usr/lib64/python2.6/json/encoder.py", line 344, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 3448 is not JSON serializable
有人可以告诉我这里究竟发生了什么,我该如何克服它?任何形式的帮助将不胜感激。提前致谢! :)
答案 0 :(得分:1)
我从未使用过p2neo,但如果我查看文档
此:
batch.create((nodedata[iindex], "rel", nodedata[-iindex], {"weight": i}))
缺少rel()部分:
batch.create(rel(nodedata[iindex], "rel", nodedata[-iindex], {"weight": i}))
答案 1 :(得分:1)
如果不能使用相同的数据集运行您的代码很难说,但这很可能是由weightdata
中的项目类型引起的。
逐步执行代码或打印数据类型,以确定关系描述符的i
部分中{"weight": i}
的类型。您可能会发现这不是int
- 这是JSON编号序列化所必需的。如果这个理论是正确的,那么在属性集中使用它之前,您需要找到一种方法来将该属性值转换或以其他方式转换为int
。