orientdb

时间:2015-09-11 13:31:54

标签: neo4j graph-databases orientdb

我有一个OrientDB图数据库,其节点与NEXT类型的链接串联。 (我的数据中有几个不同的系列,没有一个节点有多个传入和一个传出" NEXT"链接)。节点都有一个名为" name"的属性。我想找到从开始到结束遍历路径时出现的所有名称序列。

即。要获得一个名称序列,从一个没有传入NEXT链接的节点开始,按照NEXT链接,直到你到达一个没有传出的节点" NEXT"链接,并将您传递的节点的所有名称收集到列表中。

e.g。形式的子图     (Bob)-[NEXT]->(Sharon)-[NEXT]->(Carl) 应该列出清单     ["Bob", "Sharon", "Carl"]

为了澄清,这里有一个Cypher(Neo4j)查询,它可以获得所有可能的列表。

    match (start) -[:NEXT*]-> (end),
    p = shortestPath(start-[:NEXT*]-> end)
    where not ()-[:NEXT]->(start) and not (end)-[:NEXT]->()
    return extract( s in nodes(p) | s.name ) as path

但是,我需要在OrientDB中执行此操作,而OrientDB不使用Cypher。

我想知道在OrientDB中是否可行,如果是这样,是否在SQL语言或Gremlin中更简单。

作为第二个问题,理想情况下我不想返回所有名称列表,因为我真正关心的是每个列表的出现频率。因此,我希望返回唯一列表以及找到该特定列表的频率。这是可以在OrientDB中做的,还是我必须如上所述从OrientDB检索所有路径数据并在其他地方进行聚合?

更新

我在这里创建了一些示例数据,以匹配原始问题的陈述。

create database plocal:people
create class Person extends V
create property Person.name string
create property Person.age float
create property Person.ident integer   

insert into Person(name,age,ident) VALUES ("Bob", 30.5, 1)
insert into Person(name,age,ident) VALUES ("Bob", 30.5, 2)
insert into Person(name,age,ident) VALUES ("Carol", 20.3, 3)
insert into Person(name,age,ident) VALUES ("Carol", 19, 4)
insert into Person(name,age,ident) VALUES ("Laura", 75, 5)
insert into Person(name,age,ident) VALUES ("Laura", 60.5, 6)
insert into Person(name,age,ident) VALUES ("Laura", 46, 7)
insert into Person(name,age,ident) VALUES ("Mike", 16.3, 8)
insert into Person(name,age,ident) VALUES ("David", 86, 9)
insert into Person(name,age,ident) VALUES ("Alice", 5, 10)
insert into Person(name,age,ident) VALUES ("Nigel", 69, 11)
insert into Person(name,age,ident) VALUES ("Carol", 60, 12)
insert into Person(name,age,ident) VALUES ("Mike", 16.3, 13)
insert into Person(name,age,ident) VALUES ("Alice", 5, 14)
insert into Person(name,age,ident) VALUES ("Mike", 16.3, 15)

create class NEXT extends E

create edge NEXT from (select from Person where ident = 1) to (select from Person where ident = 3)
create edge NEXT from (select from Person where ident = 2) to (select from Person where ident = 4)
create edge NEXT from (select from Person where ident = 8) to (select from Person where ident = 12)
create edge NEXT from (select from Person where ident = 5) to (select from Person where ident = 15)
create edge NEXT from (select from Person where ident = 15) to (select from Person where ident = 14)
create edge NEXT from (select from Person where ident = 7) to (select from Person where ident = 13)
create edge NEXT from (select from Person where ident = 13) to (select from Person where ident = 10)

这应该给我以下最终结果

    {li> 2 ["Bob", "Carol"] {li> 2 ["Laura", "Mike", "Alice"] {li> 1 ["Laura"] {li> 1 ["Mike", "Carol"] {li> 1 ["David"] {li> 1 ["Nigel"]

以下是我使用neRok的建议

首先选择所有起始节点 - 这可以按预期工作

orientdb {db=people}> select from Person where in_NEXT is null

----+------+------+-----+----+-----+--------
#   |@RID  |@CLASS|name |age |ident|out_NEXT
----+------+------+-----+----+-----+--------
0   |#11:0 |Person|Bob  |30.5|1    |[#12:0]
1   |#11:1 |Person|Bob  |30.5|2    |[#12:1]
2   |#11:4 |Person|Laura|75.0|5    |[#12:3]
3   |#11:5 |Person|Laura|60.5|6    |null
4   |#11:6 |Person|Laura|46.0|7    |[#12:5]
5   |#11:7 |Person|Mike |16.3|8    |[#12:2]
6   |#11:8 |Person|David|86.0|9    |null
7   |#11:10|Person|Nigel|69.0|11   |null
----+------+------+-----+----+-----+--------

现在,如果我尝试通过遍历这些节点获得名称数组

select $series.name from (select from Person where in_NEXT is null ) let $series = (traverse out('NEXT') from $current)

----+------+-------
#   |@CLASS|$series
----+------+-------
0   |null  |[0]
1   |null  |[0]
2   |null  |[0]
3   |null  |[0]
4   |null  |[0]
5   |null  |[0]
6   |null  |[0]
7   |null  |[0]
----+------+-------    

我认为这意味着它没有得到遍历的结果,或者它无法生成一系列名称?

最终聚合步骤将所有这些行视为相同:

orientdb {db=people}> select series, sum(1) as number from (select $series.name as series from (select from Person where in_NEXT is null) let $series = (traverse out('NEXT') from $current)) group by series

----+------+------+------
#   |@CLASS|series|number
----+------+------+------
0   |null  |[0]   |8
----+------+------+------

所以我还没有得到我想要的结果。

我认为问题是从遍历中提取名称数组?单个遍历查询确实找到了预期的遍历,但我无法解决如何操纵数据以提供遍历遍历的名称数组。

以下是来自一个节点的示例遍历:     orientdb {db = people}>遍历(' NEXT')(选择身份= 7的人)

----+------+------+-----+----+-----+--------+-------
#   |@RID  |@CLASS|name |age |ident|out_NEXT|in_NEXT
----+------+------+-----+----+-----+--------+-------
0   |#11:6 |Person|Laura|46.0|7    |[#12:5] |null
1   |#11:12|Person|Mike |16.3|13   |[#12:6] |[#12:5]
2   |#11:9 |Person|Alice|5.0 |10   |null    |[#12:6]
----+------+------+-----+----+-----+--------+-------

1 个答案:

答案 0 :(得分:0)

首先,您需要一个查询来获取没有传入边的节点。此查询将用作子查询。 orientdb手册建议类似select from Nodes where in('NEXT').size() = 0,但以下对我来说似乎有点快(YMMV)select from Nodes where in_NEXT is null

现在我们有一个起始节点列表,我们可以使用遍历来获取所有传出边缘。

select $series.name from (
    select from Nodes where in_NEXT is null
)
let $series = (traverse out('NEXT') from $current)

此查询将返回包含数据["Bob","Sharon","Carl"]的行,如您所愿。

现在就计算每个系列的出现次数。在处理独特/不同和计数时,我总是摸不着头脑(所以这可能不是一个好方法),但以下似乎有效;

select series, sum(1) as number from (
  select $series.name as series from (
    select from Nodes where in_NEXT is null
  ) 
  let $series = (traverse out('NEXT') from $current)
)
group by series

因此,我们将先前的查询包装在另一个查询中,该查询按系列分组,然后获取每行的计数。

另一方面,这似乎是一种“昂贵”的消费品。查询。也许您可以以更有效的方式创建数据库?那可能是另一个问题/讨论。

更新以解决您的问题更新; 使用您的数据库设置,我的所有查询都在工作室web-app中工作。有一些“怪癖”和#39;但是在控制台中。第一个查询在控制台中给出了以下内容;

orientdb {db=stackpeople}> select $series.name from (select from Person where in_NEXT is null ) let $series = (traverse out('NEXT') from $current)

----+------+-------
#   |@CLASS|$series
----+------+-------
0   |null  |[2]
1   |null  |[2]
2   |null  |[3]
3   |null  |[1]
4   |null  |[3]
5   |null  |[2]
6   |null  |[1]
7   |null  |[1]
----+------+-------

我不知道为什么你的结果会将每个$系列行显示为[0]。也许您没有查询正确的字段(即您没有使用上面的演示数据库)。我的另一个猜测是你正在使用的OrientDB版本存在问题 - 我使用的是2.1-rc6。

Null类是正确的,因为这是一个预测查询,而不是记录。它可以通过将查询的开头更改为select *, $series.name from ...来记录。

无论出于何种原因,$ series未在控制台中展开,但它显示$ series包含的记录数($ series必须是列表)。我通常不会使用控制台,因此我不知道这是否是预期的结果,或者是一个错误(我猜这只是它显示列表的方式)。我发现了一种显示名称的方法,使用以下查询;

orientdb {db=stackpeople}> select *, $series.name.asString() as names from (select from Person where in_NEXT is null ) let $series = (traverse out('NEXT') from $current)

----+------+------+-----+----+-----+--------+--------------------
#   |@RID  |@CLASS|name |age |ident|out_NEXT|names
----+------+------+-----+----+-----+--------+--------------------
0   |#11:0 |Person|Bob  |30.5|1    |[size=1]|[Bob, Carol]
1   |#11:1 |Person|Bob  |30.5|2    |[size=1]|[Bob, Carol]
2   |#11:4 |Person|Laura|75.0|5    |[size=1]|[Laura, Mike, Alice]
3   |#11:5 |Person|Laura|60.5|6    |null    |[Laura]
4   |#11:6 |Person|Laura|46.0|7    |[size=1]|[Laura, Mike, Alice]
5   |#11:7 |Person|Mike |16.3|8    |[size=1]|[Mike, Carol]
6   |#11:8 |Person|David|86.0|9    |null    |[David]
7   |#11:10|Person|Nigel|69.0|11   |null    |[Nigel]
----+------+------+-----+----+-----+--------+--------------------

关于最后一个查询(名称数),我的查询给出了;

orientdb {db=stackpeople}> select series, sum(1) as number from (select $series.name as series from (select from Person where in_NEXT is null) let $series = (traverse out('NEXT') from $current)) group by series

----+------+------+------
#   |@CLASS|series|number
----+------+------+------
0   |null  |[2]   |2
1   |null  |[3]   |2
2   |null  |[1]   |1
3   |null  |[2]   |1
4   |null  |[1]   |1
5   |null  |[1]   |1
----+------+------+------

这又是正确的(尽管系列显示为结果计数)。这可以通过以下方式进行调整;

orientdb {db=stackpeople}> select names.asString(), sum(1) as number from (select $series.name as names from (select from Person where in_NEXT is null) let $series = (traverse out('NEXT') from $current)) group by names

----+------+--------------------+------
#   |@CLASS|names               |number
----+------+--------------------+------
0   |null  |[Bob, Carol]        |2
1   |null  |[Laura, Mike, Alice]|2
2   |null  |[Laura]             |1
3   |null  |[Mike, Carol]       |1
4   |null  |[David]             |1
5   |null  |[Nigel]             |1
----+------+--------------------+------

请注意,将列表更改为字符串意味着您需要在接收查询结果的任何程序/代码中拆分此字符串,以便了解每个名称。因此,最好不要将其更改为字符串,然后查询将实际返回一个列表。