Neo4j Cypher查询与层次关系

时间:2014-07-30 16:05:24

标签: neo4j cypher

我有一些包含电影的数据库。电影在具有层次结构的区域中发布。层次结构如下(全局) - [包含] - >(欧盟),(全局) - [包含] - >(美国),(欧盟) - [包含] - >(英国),(欧盟) - [含有] - >(SE)

我想要一个Cypher查询,它将返回我所在地区的电影版本,或层次结构中较高层的其中一个区域。

如果我在英国并且在英国和欧盟发行了一部电影,我想仅返回英国版本。如果它在欧盟发布,但没有特定的英国版本,我想退回欧盟版本。

问题是如何避免重复。

我的数据有这样的结构,我想为每部电影返回一个版本

(Movie1)-[has_release]->(release1)-[has_region]->(EU) 
(Movie1)-[has_release]->(release2)-[has_region]->(Global)
(Movie2)-[has_release]->(release3)-[has_region]->(UK)
(Movie2)-[has_release]->(release4)-[has_region]->(US)

在这种情况下,当我在英国查询电影时,我想返回release1(和release3),因为EU与英国有包含关系,但我不想返回release2,因为它已经为Movie1发布了一个版本,所以我希望将该区域层次结构中最接近的版本返回给英国,在这种情况下是EU。

2 个答案:

答案 0 :(得分:9)

这是一个很好的问题,我很高兴得到一个答案。我将逐步介绍我的解决方案。首先,这是我正在测试的示例数据:

CREATE 

(Global:Region {name:'Global'}),
(US:Region {name:'US'}),
(EU:Region {name:'EU'}),
(UK:Region {name:'UK'}),
(SE:Region {name:'SE'}),

(Global)-[:CONTAINS]->(EU),
(Global)-[:CONTAINS]->(US),
(EU)-[:CONTAINS]->(UK),
(EU)-[:CONTAINS]->(SE),

(Movie1:Movie {name:'Movie 1'}),
(Movie2:Movie {name:'Movie 2'}),
(Release1:Release {name:'Release 1'}),
(Release2:Release {name:'Release 2'}),
(Release3:Release {name:'Release 3'}),
(Release4:Release {name:'Release 4'}),

(Movie1)-[:HAS_RELEASE]->(Release1)-[:HAS_REGION]->(EU),
(Movie1)-[:HAS_RELEASE]->(Release2)-[:HAS_REGION]->(Global),
(Movie2)-[:HAS_RELEASE]->(Release3)-[:HAS_REGION]->(UK),
(Movie2)-[:HAS_RELEASE]->(Release4)-[:HAS_REGION]->(US);

这是我的解决方案......

MATCH p = (m:Movie)-[:HAS_RELEASE]->(:Release)-[:HAS_REGION]->(:Region)-[:CONTAINS*0..]->(:Region {name:'UK'})
WITH m, p
ORDER BY LENGTH(p)
WITH m, HEAD(COLLECT(p)) AS path
RETURN m.name AS Movie, [x IN NODES(path) WHERE x:Release | x.name] AS Release;

......产生:

Movie    Release
Movie 1  Release 1
Movie 2  Release 3

好的,让我们一步一步地完成这个查询。第一部分......

MATCH p = (m:Movie)-[:HAS_RELEASE]->(:Release)-[:HAS_REGION]->(:Region)-[:CONTAINS*0..]->(:Region {name:'UK'})

...将电影与包含英国地区(任意长度)的任何地区的版本进行匹配。请注意*..0表示我们仍然在英国捕获版本,因为这将是一个0长度的步骤。

然后,对于每部电影,我们想按路径长度排序,因为对于具有多条路径的电影(如电影1),我们希望它最先路径最短...

WITH m, p
ORDER BY LENGTH(p)

...因为我们想收集并保留最直接到UK节点的路径(这是集合中的第一个路径,因为我们按路径长度递增排序):

WITH m, HEAD(COLLECT(p)) AS path

现在我们为每部电影都有一条路径。最后一行使用EXTRACT和FILTER的组合从每个路径中获取Release节点名称:

RETURN m.name AS Movie, [x IN NODES(path) WHERE x:Release | x.name] AS Release

答案 1 :(得分:2)

MATCH regions = (a:Region)-[:CONTAINS*]->(b:Region)
WHERE b.title = "UK"
WITH regions
MATCH (m:Movie {title: "The Matrix"})
WITH m, regions
MATCH p = (m)-[:HAS_RELEASE]->(rel:Release)-[:HAS_REGION]->(reg:Region)-[:CONTAINS*0..]->(regMin)
WHERE reg IN nodes(regions)
WITH rel
MATCH p = (a:Region)-[:CONTAINS*0..]->(b:Region)<-[:HAS_REGION]-(rel)
WITH COLLECT(p) AS paths, MAX(length(p)) AS maxLength
WITH FILTER(path IN paths WHERE length(path) = maxLength) as path
WITH path UNWIND path AS result
RETURN FILTER(p IN nodes(result) WHERE p:Release)