我正在尝试对移动呼叫数据进行CDR(呼叫详细记录)分析。呼叫由PERSON
,THROUGH
塔和CONNECTS
拨打到一个号码。我想隔离在特定日期和时间之前进行的呼叫,并且在记录中的特定日期和时间之后呼叫号码不存在。我当前的查询只显示我正在寻找的特定事件之前的数据:
MATCH (a:PERSON)-[t:THROUGH]->()-[:CONNECTS]->(b)
WHERE toInteger(t.time)<1500399900
RETURN a,b
但是,我现在如何仅隔离那些在t.time=1500399900
之前而不是之后存在的记录?另外,如果我不将上述查询限制为1000,我的浏览器(谷歌浏览器)崩溃了。有什么解决方案吗?
如果有帮助,这就是我在neo4j中加载csv文件的方式:
//Setup initial constraints
CREATE CONSTRAINT ON (a:PERSON) assert a.number is unique;
CREATE CONSTRAINT ON (b:TOWER) assert b.id is unique;
//Create the appropriate nodes
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MERGE (a:PERSON {number: line.Calling})
MERGE (b:PERSON {number: line.Called})
MERGE (c:TOWER {id: line.CellID1})
//Setup proper indexing
DROP CONSTRAINT ON (a:PERSON) ASSERT a.number IS UNIQUE;
DROP CONSTRAINT ON (a:TOWER) ASSERT a.id IS UNIQUE;
CREATE INDEX ON :PERSON(number);
CREATE INDEX ON :TOWER(id);
//Create relationships between people and calls
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///combined.csv" AS line
MATCH (a:PERSON {number: line.Calling}),(b:PERSON {number: line.Called}),(c:TOWER {id: line.CellID1})
CREATE (a)-[t:THROUGH]->(c)-[x:CONNECTS]->(b)
SET x.calltype = line.CallType, x.provider = line.Provider, t.time=toInteger(line.ts), t.duration=toInteger(line.Duration)
答案 0 :(得分:1)
但是,我现在如何仅隔离那些在
t.time=1500399900
之前而不是之后存在的记录?
让我们创建一个小示例数据集:
CREATE
(a1:PERSON {name: 'a1'}), (a2:PERSON {name: 'a2'}),
(b1:PERSON {name: 'b1'}), (b2:PERSON {name: 'b2'}),
(b3:PERSON {name: 'b3'}), (b4:PERSON {name: 'b4'}),
(a1)-[:THROUGH {time: 1}]->(:TOWER)-[:CONNECTS]->(b1),
(a1)-[:THROUGH {time: 3}]->(:TOWER)-[:CONNECTS]->(b2),
(a2)-[:THROUGH {time: 2}]->(:TOWER)-[:CONNECTS]->(b3),
(a2)-[:THROUGH {time: 15}]->(:TOWER)-[:CONNECTS]->(b4)
可视化时看起来像这样:
此查询可能会为您解决问题:
MATCH (a:PERSON)-[t1:THROUGH]->(:TOWER)-[:CONNECTS]->(b:PERSON)
WHERE toInteger(t1.time) < 5
OPTIONAL MATCH (a)-[t2:THROUGH]->(:TOWER)
WHERE t2.time >= 5
WITH a, b, t1, t2
WHERE t2 IS NULL
RETURN a, b, t1
首次匹配后,它会查找在时间戳PERSON
之后启动的a
5
来电。可能没有这样的调用,因此我们使用OPTIONAL MATCH
。如果在指定的时间戳之后没有调用,t2
的值将为null,因此我们进行IS NULL
检查并返回过滤后的结果。
另外,如果我不将上述查询限制为1000,我的浏览器(谷歌浏览器)崩溃了。有什么解决方案吗?
如果使用图形可视化工具,它通常无法渲染超过几百个节点。可能的解决方法:
SKIP ... LIMIT ...
进行分页。