我在图数据库,原始机场和目的地机场和运营商中创建了3个节点。它们与名为“cancel_by”的财产相关联。
MATCH (origin:origin_airport {name: row.ORIGIN}),
(destination:dest_airport {name: row.DEST}),
(carrier:Carrier {name: row.UNIQUE_CARRIER})
CREATE (origin)-[:cancelled_by {cancellation: row.count}]->(carrier)
CREATE (origin)-[:cancelled_by {cancellation: row.count}]->(destination)
CREATE (origin)-[:operated_by {carrier: row.UNIQUE_CARRIER}]->(carrier)
cancelled_by保存特定运营商取消的值次数。我的输入文件格式如下:
ORIGIN UNIQUE_CARRIER DEST Cancelled
ABE DL ATL 1
ABE EV ATL 1
ABE EV DTW 3
ABE EV ORD 3
ABQ DL DFW 2
ABQ B6 JFK 2
这里我需要计算每个运营商的取消百分比。我期待的结果如下:
UNIQUE_CARRIER DEST Percentage_Cancelled
DL 25%
EV 58.33%
B6 16.66%
Example: Total number of cancellation = 12
No of cancellation for DL = 3
Percentage of cancellation for DL = (3/12)*100 = 25%
以下查询给出了每个运营商的取消总和:
MATCH ()-[ca:cancelled_by]->(c:Carrier)
RETURN c.name AS Carrier,
SUM(toFloat(ca.cancellation)) As sum
ORDER BY sum DESC
LIMIT 10
我尝试了以下查询来计算百分比:
MATCH ()-[ca:cancelled_by]->(c:Carrier)
WITH SUM(toFloat(ca.cancellation)) As total
MATCH ()-[ca:cancelled_by]->(c:Carrier)
RETURN c.name AS Carrier,
(toFloat(ca.cancellation)/total)*100 AS percent
ORDER BY percent DESC
LIMIT 10
但它不是通过分组来计算百分比,而是单独计算百分比。
Carrier sum
DL 0.36862408915559364
DL 0.34290612944706383
DL 0.3171881697385341
如何使用Neo4j中的密码查询基于group_by计算百分比?
答案 0 :(得分:4)
您在分组时忘记了每个运营商的总和,并且不一定总是使用强制转换来浮动 - 就在最后一次计算乘以浮点数时。
MATCH ()-[ca:cancelled_by]->(:Carrier)
WITH SUM(ca.cancellation) As total
MATCH ()-[ca:cancelled_by]->(c:Carrier)
RETURN c.name AS Carrier,
100.0 * SUM(ca.cancellation) / total AS percent
ORDER BY percent DESC
LIMIT 10
答案 1 :(得分:0)
您好您可以尝试使用R
dplyr
包。
使用链接操作%>%
以及函数
group_by
,summarize
和transmute
。 group_by
和summarize
会给你每组内取消的金额。使用
transmute
用于获取相对频率。