TypeError:不是JSON可序列化的Py2neo批量提交

时间:2014-07-10 13:50:40

标签: python json neo4j batch-processing py2neo

我正在创建一个庞大的图形数据库,拥有超过140万个节点和1.6亿个关系。我的代码如下:

from py2neo import neo4j
# first we create all the nodes
batch = neo4j.WriteBatch(graph_db)
nodedata = []

for index, i in enumerate(words): # words is predefined
    batch.create({"term":i})
    if index%5000 == 0: #so as not to exceed the batch restrictions
        results = batch.submit()
        for x in results:
            nodedata.append(x)
        batch = neo4j.WriteBatch(graph_db)

results = batch.submit()
for x in results:
    nodedata.append(x)

#nodedata contains all the node instances now
#time to create relationships

batch = neo4j.WriteBatch(graph_db)
for iindex, i in enumerate(weightdata): #weightdata is predefined 
    batch.create((nodedata[iindex], "rel", nodedata[-iindex], {"weight": i})) #there is a different way how I decide the indexes of nodedata, but just as an example I put iindex and -iindex
    if iindex%5000 == 0: #again batch constraints
        batch.submit() #this is the line that shows error
        batch = neo4j.WriteBatch(graph_db)
batch.submit()

我收到以下错误:

Traceback (most recent call last):
  File "test.py", line 53, in <module>
    batch.submit()
  File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2116, in submit
    for response in self._submit()
  File "/usr/lib/python2.6/site-packages/py2neo/neo4j.py", line 2085, in _submit
    for id_, request in enumerate(self.requests)
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 427, in _send
    return self._client().send(request)
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 351, in send
    rs = self._send_request(request.method, request.uri, request.body, request.$
  File "/usr/lib/python2.6/site-packages/py2neo/rest.py", line 326, in _send_re$
    data = json.dumps(data, separators=(",", ":"))
  File "/usr/lib64/python2.6/json/__init__.py", line 237, in dumps
    **kw).encode(obj)
  File "/usr/lib64/python2.6/json/encoder.py", line 367, in encode
    chunks = list(self.iterencode(o))
  File "/usr/lib64/python2.6/json/encoder.py", line 306, in _iterencode
    for chunk in self._iterencode_list(o, markers):
  File "/usr/lib64/python2.6/json/encoder.py", line 204, in _iterencode_list
    for chunk in self._iterencode(value, markers):
  File "/usr/lib64/python2.6/json/encoder.py", line 309, in _iterencode
    for chunk in self._iterencode_dict(o, markers):
  File "/usr/lib64/python2.6/json/encoder.py", line 275, in _iterencode_dict
    for chunk in self._iterencode(value, markers):
  File "/usr/lib64/python2.6/json/encoder.py", line 317, in _iterencode
    for chunk in self._iterencode_default(o, markers):
  File "/usr/lib64/python2.6/json/encoder.py", line 323, in _iterencode_default
    newobj = self.default(o)
  File "/usr/lib64/python2.6/json/encoder.py", line 344, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 3448 is not JSON serializable

有人可以告诉我这里究竟发生了什么,我该如何克服它?任何形式的帮助将不胜感激。提前致谢! :)

2 个答案:

答案 0 :(得分:1)

我从未使用过p2neo,但如果我查看文档

此:

batch.create((nodedata[iindex], "rel", nodedata[-iindex], {"weight": i}))

缺少rel()部分:

batch.create(rel(nodedata[iindex], "rel", nodedata[-iindex], {"weight": i}))

答案 1 :(得分:1)

如果不能使用相同的数据集运行您的代码很难说,但这很可能是由weightdata中的项目类型引起的。

逐步执行代码或打印数据类型,以确定关系描述符的i部分中{"weight": i}的类型。您可能会发现这不是int - 这是JSON编号序列化所必需的。如果这个理论是正确的,那么在属性集中使用它之前,您需要找到一种方法来将该属性值转换或以其他方式转换为int