Cypher查询按时隔离记录

时间:2017-11-01 07:08:06

标签: neo4j cypher

我正在尝试对移动呼叫数据进行CDR(呼叫详细记录)分析。呼叫由PERSONTHROUGH塔和CONNECTS拨打到一个号码。我想隔离在特定日期和时间之前进行的呼叫,并且在记录中的特定日期和时间之后呼叫号码不存在。我当前的查询只显示我正在寻找的特定事件之前的数据:

MATCH (a:PERSON)-[t:THROUGH]->()-[:CONNECTS]->(b)
WHERE toInteger(t.time)<1500399900
RETURN a,b

但是,我现在如何仅隔离那些在t.time=1500399900之前而不是之后存在的记录?另外,如果我不将上述查询限制为1000,我的浏览器(谷歌浏览器)崩溃了。有什么解决方案吗?

按照建议运行查询后,这是EXPLAIN的样子: enter image description here

如果有帮助,这就是我在neo4j中加载csv文件的方式:

//Setup initial constraints
CREATE CONSTRAINT ON (a:PERSON) assert a.number is unique;
CREATE CONSTRAINT ON (b:TOWER) assert b.id is unique;

//Create the appropriate nodes
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MERGE (a:PERSON {number: line.Calling})
MERGE (b:PERSON {number: line.Called})
MERGE (c:TOWER {id: line.CellID1})


//Setup proper indexing
DROP CONSTRAINT ON (a:PERSON) ASSERT a.number IS UNIQUE;
DROP CONSTRAINT ON (a:TOWER) ASSERT a.id IS UNIQUE;

CREATE INDEX ON :PERSON(number);
CREATE INDEX ON :TOWER(id);

//Create relationships between people and calls
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MATCH (a:PERSON {number: line.Calling}),(b:PERSON {number: line.Called}),(c:TOWER {id: line.CellID1})
CREATE (a)-[t:THROUGH]->(c)-[x:CONNECTS]->(b)
SET  x.calltype = line.CallType, x.provider = line.Provider, t.time=toInteger(line.ts), t.duration=toInteger(line.Duration)

1 个答案:

答案 0 :(得分:1)

  

但是,我现在如何仅隔离那些在t.time=1500399900之前而不是之后存在的记录?

让我们创建一个小示例数据集:

CREATE 
  (a1:PERSON {name: 'a1'}), (a2:PERSON {name: 'a2'}),
  (b1:PERSON {name: 'b1'}), (b2:PERSON {name: 'b2'}),
  (b3:PERSON {name: 'b3'}), (b4:PERSON {name: 'b4'}),
  (a1)-[:THROUGH {time:  1}]->(:TOWER)-[:CONNECTS]->(b1),
  (a1)-[:THROUGH {time:  3}]->(:TOWER)-[:CONNECTS]->(b2),
  (a2)-[:THROUGH {time:  2}]->(:TOWER)-[:CONNECTS]->(b3),
  (a2)-[:THROUGH {time: 15}]->(:TOWER)-[:CONNECTS]->(b4)

可视化时看起来像这样:

enter image description here

此查询可能会为您解决问题:

MATCH (a:PERSON)-[t1:THROUGH]->(:TOWER)-[:CONNECTS]->(b:PERSON)
WHERE toInteger(t1.time) < 5
OPTIONAL MATCH (a)-[t2:THROUGH]->(:TOWER)
WHERE t2.time >= 5
WITH a, b, t1, t2
WHERE t2 IS NULL
RETURN a, b, t1

首次匹配后,它会查找在时间戳PERSON之后启动的a 5来电。可能没有这样的调用,因此我们使用OPTIONAL MATCH。如果在指定的时间戳之后没有调用,t2的值将为null,因此我们进行IS NULL检查并返回过滤后的结果。

  

另外,如果我不将上述查询限制为1000,我的浏览器(谷歌浏览器)崩溃了。有什么解决方案吗?

如果使用图形可视化工具,它通常无法渲染超过几百个节点。可能的解决方法:

  • 使用可以更好地扩展的Web浏览器的文本视图。
  • 使用SKIP ... LIMIT ...进行分页。