使用py2neo失败WriteBatch操作

时间:2013-11-15 21:00:17

标签: python neo4j py2neo

我正在尝试找到解决以下问题的方法。我已经在SO question中看到了它的准描述,但还没有真正回答。

以下代码失败,从新图表开始:

from py2neo import neo4j

def add_test_nodes():
    # Add a test node manually
    alice = g.get_or_create_indexed_node("Users", "user_id", 12345, {"user_id":12345})

def do_batch(graph):
    # Begin batch write transaction
    batch = neo4j.WriteBatch(graph)

    # get some updated node properties to add
    new_node_data = {"user_id":12345, "name": "Alice"}

    # batch requests
    a = batch.get_or_create_in_index(neo4j.Node, "Users", "user_id", 12345, {})
    batch.set_properties(a, new_node_data)  #<-- I'm the problem

    # execute batch requests and clear
    batch.run()
    batch.clear()

if __name__ == '__main__':
    # Initialize Graph DB service and create a Users node index
    g = neo4j.GraphDatabaseService()
    users_idx = g.get_or_create_index(neo4j.Node, "Users")

    # run the test functions
    add_test_nodes()
    alice = g.get_or_create_indexed_node("Users", "user_id", 12345)
    print alice

    do_batch(g)

    # get alice back and assert additional properties were added
    alice = g.get_or_create_indexed_node("Users", "user_id", 12345)
    assert "name" in alice

简而言之,我希望在一个批处理事务中更新现有的索引节点属性。失败发生在batch.set_properties行,这是因为前一行返回的BatchRequest对象未被解释为有效节点。虽然不是完全同意,但感觉我正在尝试像here

那样的回答

一些细节

>>> import py2neo
>>> py2neo.__version__
'1.6.0'
>>> g = py2neo.neo4j.GraphDatabaseService()
>>> g.neo4j_version
(2, 0, 0, u'M06') 

更新

如果我将问题分成不同的批次,那么它可以无误地运行:

def do_batch(graph):
    # Begin batch write transaction
    batch = neo4j.WriteBatch(graph)

    # get some updated node properties to add
    new_node_data = {"user_id":12345, "name": "Alice"}

    # batch request 1
    batch.get_or_create_in_index(neo4j.Node, "Users", "user_id", 12345, {})

    # execute batch request and clear
    alice = batch.submit()
    batch.clear()

    # batch request 2
    batch.set_properties(a, new_node_data)

    # execute batch request and clear
    batch.run()
    batch.clear()

这也适用于许多节点。虽然我不喜欢分批的想法,但这可能是目前唯一的方法。有人对此有一些评论吗?

2 个答案:

答案 0 :(得分:5)

在阅读了Neo4j 2.0.0-M06的所有新功能后,似乎节点和关系索引的旧工作流程正在被取代。在编制索引的方式中,neo目前存在一些分歧。即,labelsschema indexes

标签

标签可以任意附加到节点,并可以作为索引的参考。

索引

可以通过引用标签(此处为User)和节点属性键(screen_name)在Cypher中创建索引:

CREATE INDEX ON :User(screen_name)

Cypher MERGE

此外,现在可以通过新的cypher MERGE函数实现索引get_or_create方法,该函数非常简洁地合并了标签及其索引:

MERGE (me:User{screen_name:"SunPowered"}) RETURN me

批量

通过将CypherQuery实例附加到批处理对象,可以在py2neo中对查询进行批处理:

from py2neo import neo4j

graph_db = neo4j.GraphDatabaseService()
cypher_merge_user = neo4j.CypherQuery(graph_db, 
    "MERGE (user:User {screen_name:{name}}) RETURN user")

def get_or_create_user(screen_name):
    """Return the user if exists, create one if not"""
    return cypher_merge_user.execute_one(name=screen_name)

def get_or_create_users(screen_names):
    """Apply the get or create user cypher query to many usernames in a 
    batch transaction"""

    batch = neo4j.WriteBatch(graph_db)

    for screen_name in screen_names:
        batch.append_cypher(cypher_merge_user, params=dict(name=screen_name))

    return batch.submit()

root = get_or_create_user("Root")
users = get_or_create_users(["alice", "bob", "charlie"])

限制

但是,存在一个限制,即批处理事务中的密码查询的结果以后不能在同一事务中引用。最初的问题是在一个批处理事务中更新索引用户属性的集合。就我而言,这仍然是不可能的。例如,以下代码段会引发错误:

batch = neo4j.WriteBatch(graph_db)
b1 = batch.append_cypher(cypher_merge_user, params=dict(name="Alice"))
batch.set_properties(b1, dict(last_name="Smith")})
resp = batch.submit()

所以,似乎尽管使用get_or_create在标记节点上实现py2neo的开销稍微少一些,因为不再需要遗留索引,原始问题仍然需要2个单独的批处理交易完成。

答案 1 :(得分:1)

您的问题似乎不在batch.set_properties()中,而在batch.get_or_create_in_index()的输出中。如果您使用batch.create()添加节点,则可以使用:

db = neo4j.GraphDatabaseService()

batch = neo4j.WriteBatch(db)
# create a node instead of getting it from index
test_node = batch.create({'key': 'value'})
# set new properties on the node
batch.set_properties(test_node, {'key': 'foo'})

batch.submit()

如果您查看batch.create()batch.get_or_create_in_index()返回的BatchRequest对象的属性,则URI会有所不同,因为这些方法使用了neo4j REST API的不同部分:

test_node = batch.create({'key': 'value'})
print test_node.uri # node
print test_node.body # {'key': 'value'}
print test_node.method # POST

index_node = batch.get_or_create_in_index(neo4j.Node, "Users", "user_id", 12345, {})
print index_node.uri # index/node/Users?uniqueness=get_or_create
print index_node.body # {u'value': 12345, u'key': 'user_id', u'properties': {}}
print index_node.method # POST

batch.submit()

所以我猜batch.set_properties()以某种方式无法处理索引节点的URI?即它没有真正获得节点的正确URI?

不解决问题,但可能是其他人的指针;)?