快速获取巨大图形的所有互连节点的独特属性

时间:2017-02-09 15:22:58

标签: neo4j

我的Neo4J数据库目前有超过1300万个节点,边缘数量多。简化结构如下(省略大多数边缘类型)

User - HAS_EMAIL -> Email // is unique
     - HAS_IBAN  -> Iban  // is unique
     - HAS_PHONE -> Phone // is unique

我希望得到所有彼此互连的用户ID,无论路径的长度如何。这样我就可以

我使用Neo4Js HTTP API开始使用如下的密码查询。

MATCH (u:User {uid: '12345'})-[*1..]-(otherUser) 
RETURN DISTINCT otherUser

使用不带上限的可变长度模式匹配,特别是没有限制,这很慢。

所以我挖了一下,用expandConfig方法找到了APOC库。

MATCH (u:User {uid: '12345'})
CALL apoc.path.expandConfig(c, {bfs:true, uniqueness:"NODE_GLOBAL"}) YIELD path
// Extracting the 'uid' property
RETURN extract(
  n IN (
    // We only want 'User' nodes
    filter (
      x IN NODES(path) WHERE 'User' IN labels(x)
    )
  ) | n.uid
) as uid

这就像魅力一样,在大多数情况下会在几毫秒内返回所有节点。

查询一个我知道他是#34;非常好的用户需要将近30秒。连接(24k节点,40k边缘)。

回复示例

{
  "results": [
    {
      "columns": [
        "uid"
      ],
      "data": [
        {"row": [["9974"]], "meta": [null]},
        {"row": [["9974"]], "meta": [null]},
        {"row": [["9974"]], "meta": [null]},
        {"row": [["9974","14367"] ],"meta": [null,null]},
        {"row": [["9974","11820"] ],"meta": [null,null]},
        {"row": [["9974","11821"] ],"meta": [null,null]},
        {"row": [["9974","11822"] ],"meta": [null,null]},
        {"row": [["9974","11823"] ],"meta": [null,null]},
        {"row": [["9974","9314"] ],"meta": [null,null]},
        {"row": [["9974","9313"] ],"meta": [null,null]},
        {"row": [["9974","9317"] ],"meta": [null,null]},
        {"row": [["9974","14367"] ],"meta": [null,null]},
        {"row": [["9974","11820"] ],"meta": [null,null]},
        {"row": [["9974","11821"] ],"meta": [null,null]},
        {"row": [["9974","11822"] ],"meta": [null,null]},
        {"row": [["9974","11823"] ],"meta": [null,null]},
        {"row": [["9974","9314"] ],"meta": [null,null]},
        {"row": [["9974","9313"] ],"meta": [null,null]},
        {"row": [["9974","9317"] ],"meta": [null,null]},
        {"row": [["9974","11820","3287" ]],"meta": [null,null,null]},
        {"row": [["9974","11820","39584" ]],"meta": [null,null,null]},
        {"row": [["9974","11820","5109" ]],"meta": [null,null,null]},
        {"row": [["9974","11820","3379" ]],"meta": [null,null,null]},
        {"row": [["9974","11820","3288" ]],"meta": [null,null,null]},
        --- Snipp ---

现在我想摆脱所有重复项,得到如下结果

{
  "results": [
    {
      "columns": [
        "uid"
      ],
      "data": [
        {"row": [["9974"]], "meta": [null]},
        {"row": [["14367"]], "meta": [null]},
        {"row": [["11820"]], "meta": [null]},
        {"row": [["11821"]],"meta": [null]},
        {"row": [["11822"]],"meta": [null]},
        {"row": [["11823"]],"meta": [null]},
        {"row": [["9314"]],"meta": [null]},
        {"row": [["9313"]],"meta": [null]},
        {"row": [["9317"]],"meta": [null]},
        {"row": [["14367"]],"meta": [null]},
        {"row": [["11820"]],"meta": [null]},
        {"row": [["11821"]],"meta": [null]},
        {"row": [["11822"]],"meta": [null]},
        {"row": [["11823"]],"meta": [null]},
        --- snipp ---

我将如何做到这一点? 很高兴:有没有办法让这个快?

1 个答案:

答案 0 :(得分:1)

有一些调整可以使这更快。

首先,您将遍历所有路径的节点(路径)。那里会有很多重复的节点,因为公共路径将重用相同的节点。

由于您正在使用NODE_GLOBAL唯一性,所有路径的末端节点应该构成整个子图,因此我们可以将这些作为行,然后执行过滤:用户节点(具有特定的语法)用于检查节点是否具有特定标签),然后获取uids。

MATCH (u:User {uid: '12345'})
CALL apoc.path.expandConfig(c, {bfs:true, uniqueness:"NODE_GLOBAL"}) YIELD path
WITH DISTINCT LAST(NODES(path)) as user
WHERE user:User
RETURN COLLECT(user.uid) as uid

如果你不想在一个集合中使用uid,那么最后只需返回user.uid