使用Neo4J进行基于时间的数据查询显示出比预期更多的关系

时间:2016-06-11 09:59:57

标签: neo4j cypher graph-databases

我试图了解Neo4J和基于时间的数据。

所以我基本上想要构建的是一个数据结构,它能够在一定时间内为其提供跟踪节点(页面视图)及其引用者及其引用者引用者。

我的问题是,如果我将数据与其关系保存到时间树,那么仍然会出现关系,这些关系在按小时查询某个时间时不应该是可见的。

在研究过程中,我发现了这篇关于modeling time series data with neo4j的文章。

到目前为止一切顺利,但推荐人及其子女关系并没有被时间抽象。

为了更好地说明问题,首先是数据结构:

我创建了一个索引:

CREATE INDEX ON :Year(value);
CREATE INDEX ON :Month(value);
CREATE INDEX ON :Day(value);
CREATE INDEX ON :Hour(value);
CREATE INDEX ON :Minute(value);
CREATE INDEX ON :Second(value);

然后把时间节点放在那里:

//Create Time Tree with Day Depth
WITH range(2015, 2017) AS years, range(1,12) AS months
FOREACH(year IN years |
   CREATE (y:Year {value: year})
   FOREACH(month IN months |
     CREATE (m:Month {value: month})
    MERGE (y)-[:CONTAINS]->(m)
    FOREACH(day IN (CASE
                       WHEN month IN [1,3,5,7,8,10,12] THEN range(1,31)
                      WHEN month = 2 THEN
                        CASE
                          WHEN year % 4 <> 0 THEN range(1,28)
                           WHEN year % 100 = 0 AND year % 400 = 0 THEN range(1,29)
                           ELSE range(1,28)
                         END
                       ELSE range(1,30)
                     END) |
       CREATE (d:Day {value: day})
       MERGE (m)-[:CONTAINS]->(d))))

如果我现在保存数据:

MERGE (a:tracking {ip:'someniceid', type:'page_view', timestamp:'2154645'})
MERGE (f:Domain {name:'domain1.com'})
MERGE (e:Domain {name:'domain2.com'})
MERGE (d:Domain {name:'domain3.com'})
MERGE (z:Domain {name:'domain4.com'})
MERGE (a)-[:CAME_FROM]->(f)
MERGE (f)-[:REFERRED_BY]->(e)
MERGE (e)-[:REFERRED_BY]->(d)
MERGE (d)-[:REFERRED_BY]->(z)
WITH a, 2016 AS y 
MATCH (year:Year {value: y})
WITH a, year, 5 AS m 
MATCH (year)-[:CONTAINS]->(month:Month {value: m})
WITH a, month, 9 AS d 
MATCH (month)-[:CONTAINS]->(day:Day {value: d})
WITH a, day, 14 AS h 
MERGE (day)-[:CONTAINS]->(hour:Hour {value: h})
MERGE (a)-[:HAPPENED_ON]->(hour)

我通过查询得到以下图表:

MATCH (y)-[:CONTAINS]->(m:Month {value: 5}) WITH y, m
MATCH (m)-[:CONTAINS]->(d {value: 9}) WITH y, m, d
MATCH (d)-[:CONTAINS]->(h {value: 14}) WITH y, m, d, h
MATCH (a:tracking)-[:HAPPENED_ON]->(h),(a)-[:CAME_FROM|:REFERRED_BY*]->(dom) RETURN dom AS D, a AS A

enter image description here

当我现在再保存一个数据集时,唯一的区别是更改小时和域(而不是domain4,我们现在有domain6),如:

MERGE (a:tracking {ip:'someniceid', type:'page_view', timestamp:'2154645'})"
MERGE (f:Domain {name:'domain1.com'})
MERGE (e:Domain {name:'domain2.com'})
MERGE (d:Domain {name:'domain3.com'})
MERGE (z:Domain {name:'domain6.com'})
MERGE (a)-[:CAME_FROM]->(f)
MERGE (f)-[:REFERRED_BY]->(e)
MERGE (e)-[:REFERRED_BY]->(d)
MERGE (d)-[:REFERRED_BY]->(z)
WITH a, 2016 AS y 
MATCH (year:Year {value: y})
WITH a, year, 5 AS m 
MATCH (year)-[:CONTAINS]->(month:Month {value: m})
WITH a, month, 9 AS d 
MATCH (month)-[:CONTAINS]->(day:Day {value: d})
WITH a, day, 10 AS h 
MERGE (day)-[:CONTAINS]->(hour:Hour {value: h})
MERGE (a)-[:HAPPENED_ON]->(hour)

因此,在上面使用相同的查询时,还添加了一个引用者,由于与跟踪节点相关的不同时间(小时)节点,我认为应该发生这种情况:

enter image description here

尽管跟踪已连接到不同的小时节点,但仍会显示推荐人关系!我做错了什么?对我来说,域6不应该是可见的,因为相关的跟踪没有与那个时间节点相关联......有人有想法吗?

1 个答案:

答案 0 :(得分:2)

问题是,对于每个受监视的事件,merge都不会为域创建新记录,并且您存储了错误的域序列。尝试为每次跟踪创建指向域的链接:

MERGE (a:tracking {ip:'someniceid', type:'page_view', timestamp:'2154645'})
MERGE (_f:Domain {name:'domain1.com'})
MERGE (_e:Domain {name:'domain2.com'})
MERGE (_d:Domain {name:'domain3.com'})
MERGE (_z:Domain {name:'domain4.com'})
CREATE (f:Symlink)-[:Symlink]->(_f)
CREATE (e:Symlink)-[:Symlink]->(_e)
CREATE (d:Symlink)-[:Symlink]->(_d)
CREATE (z:Symlink)-[:Symlink]->(_z)
MERGE (a)-[:CAME_FROM]->(f)
MERGE (f)-[:REFERRED_BY]->(e)
MERGE (e)-[:REFERRED_BY]->(d)
MERGE (d)-[:REFERRED_BY]->(z)
WITH a, 2016 AS y 
MATCH (year:Year {value: y})
WITH a, year, 5 AS m 
MATCH (year)-[:CONTAINS]->(month:Month {value: m})
WITH a, month, 9 AS d 
MATCH (month)-[:CONTAINS]->(day:Day {value: d})
WITH a, day, 14 AS h 
MERGE (day)-[:CONTAINS]->(hour:Hour {value: h})
MERGE (a)-[:HAPPENED_ON]->(hour)