neo4j社交网络属性的百分比

时间:2017-11-27 13:55:54

标签: graph neo4j cypher

如何计算社交网络所有连接的属性百分比? 在这个特定的样本中,我想通过评估用户的交互(呼叫,短信)来计算用户的欺诈行为:

CREATE (Alice:Person {id:'a', fraud:1})
CREATE (Bob:Person {id:'b', fraud:0})
CREATE (Charlie:Person {id:'c', fraud:0})
CREATE (David:Person {id:'d', fraud:0})
CREATE (Esther:Person {id:'e', fraud:0})
CREATE (Fanny:Person {id:'f', fraud:0})
CREATE (Gabby:Person {id:'g', fraud:0})
CREATE (Fraudster:Person {id:'h', fraud:1})


CREATE
  (Alice)-[:CALL]->(Bob),
  (Bob)-[:SMS]->(Charlie),
  (Charlie)-[:SMS]->(Bob),
  (Fanny)-[:SMS]->(Charlie),
  (Esther)-[:SMS]->(Fanny),
  (Esther)-[:CALL]->(David),
  (David)-[:CALL]->(Alice),
  (David)-[:SMS]->(Esther),
  (Alice)-[:CALL]->(Esther),
  (Alice)-[:CALL]->(Fanny),
  (Fanny)-[:CALL]->(Fraudster)

尝试查询时:

MATCH (a)-->(b)
WHERE b.fraud = 1
RETURN (count() / ( MATCH (a) -->(b) RETURN count() ) * 100)

我看到以下错误:

Invalid input '>': expected 0..9, '.', UnsignedHexInteger, UnsignedOctalInteger or UnsignedDecimalInteger (line 3, column 33 (offset: 66))
"RETURN (count() / ( MATCH (a) -->(b) RETURN count() ) * 100)"
                                 ^

enter image description here

2 个答案:

答案 0 :(得分:1)

RETURN部分中,您调用了一个新查询:MATCH (a) -->(b) RETURN count()

在Neo4j中不允许这样做,您应该使用WITH关键字进行子查询:

MATCH ()-->() 
WITH count(*) AS total
  MATCH ()-->(b)
  WHERE b.fraud = 1
  RETURN toFloat(count(*)) / total * 100

或者在您的情况下,因为您只需要数据库中的关系总数,您可以进行此查询:

MATCH ()-->(b)
WHERE b.fraud = 1
RETURN toFloat(count(*)) / size(()-->()) * 100

更新

  • 在cypher查询中添加toFloat,否则该分区会给出一个整数而不是一个浮点数

答案 1 :(得分:1)

此查询将返回每个欺诈的连接百分比:

MATCH (:Person)-[:CALL|:SMS]->(f:Person)
WITH TOFLOAT(COUNT(*))/100 AS divisor, COLLECT(f) AS fs
UNWIND fs AS f
WITH divisor, f
WHERE f.fraud = 1
RETURN f, COUNT(*)/divisor AS percentage

使用样本数据,结果为:

+----------------------------------------------+
| f                        | percentage        |
+----------------------------------------------+
| Node[13]{id:"h",fraud:1} | 9.090909090909092 |
| Node[6]{id:"a",fraud:1}  | 9.090909090909092 |
+----------------------------------------------+

此查询只需要对数据库进行一次扫描,并明确说明节点标签和关系类型 - 过滤掉数据库中可能存在的任何其他数据。