配置:
Windows 8.1
的Neo4j企业-2.2.0-M03
缓存类型:hpc
8Gb RAM
6Gb用于JVM堆(wrapper.java.initmemory = 6144 wrapper.java.maxmemory = 6144)
用于映射内存的6Gb JVM堆中的5Gb(dbms.pagecache.memory = 5G)
型号:
模型表示用户如何浏览网站。
27 522 896个节点(394Mb)
111 294 796关系(3609Mb)
33 906 363处房产(1326Mb)
293(:页)节点
27522603(:PageView)节点
0(:用户)节点(尚未加载)
每个(:PageView)节点与(:Page)节点连接
每个(:PageView)节点与next(:PageView)节点连接
每个(:PageView)节点与(:User)节点(尚未)
查询
match (:Page {Name:'#########.aspx'})<-[:At]-(:PageView)-[:Next]->(:PageView)-[:At]->(p:Page)
return p.Name,count(*) as count
order by count desc
limit 10;
个人资料信息:
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "#####################.aspx" | 5172680 |
| "###############.aspx" | 3846455 |
| "#########.aspx" | 3579022 |
| "###########.aspx" | 3051043 |
| "#############################.aspx" | 1713004 |
| "############.aspx" | 1373928 |
| "############.aspx" | 1338063 |
| "#####.aspx" | 1285447 |
| "###################.aspx" | 884077 |
| "##############.aspx" | 759665 |
+------------------------------------------------+
10 rows
195363 ms
Compiler CYPHER 2.2
Planner COST
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter(0)
|
+Expand(All)(0)
|
+Filter(1)
|
+Expand(All)(1)
|
+Filter(2)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
| Projection(0) | 881 | 10 | 0 | FRESHID105, FRESHID110, count, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | FRESHID105, FRESHID110 | { AUTOINT1}; |
| EagerAggregation | 881 | 173 | 0 | FRESHID105, FRESHID110 | |
| Projection(1) | 776404 | 35941815 | 71883630 | FRESHID105, p | |
| Filter(0) | 776404 | 35941815 | 35941815 | p | (NOT(anon[38] == anon[78]) AND hasLabel(p:Page)) |
| Expand(All)(0) | 776404 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Filter(1) | 384001 | 13345621 | 13345621 | | hasLabel(anon[67]:PageView) |
| Expand(All)(1) | 384001 | 13345621 | 19478500 | | ()-[:Next]->() |
| Filter(2) | 189923 | 6132879 | 6132879 | | hasLabel(anon[46]:PageView) |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+-------------------------------------------+--------------------------------------------------+
Total database accesses: 202202762
查询没有不必要的标签
match (:Page {Name:'Dashboard.aspx'})<-[:At]-()-[:Next]->()-[:At]->(p)
return p.Name,count(*) as count
order by count desc
limit 10;
个人资料信息:
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "#####################.aspx" | 5172680 |
| "###############.aspx" | 3846455 |
| "#########.aspx" | 3579022 |
| "###########.aspx" | 3051043 |
| "#############################.aspx" | 1713004 |
| "############.aspx" | 1373928 |
| "############.aspx" | 1338063 |
| "#####.aspx" | 1285447 |
| "###################.aspx" | 884077 |
| "##############.aspx" | 759665 |
+------------------------------------------------+
10 rows
166751 ms
Compiler CYPHER 2.2
Planner COST
Projection(0)
|
+Top
|
+EagerAggregation
|
+Projection(1)
|
+Filter
|
+Expand(All)(0)
|
+Expand(All)(1)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
| Projection(0) | 881 | 10 | 0 | FRESHID82, FRESHID87, count, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | FRESHID82, FRESHID87 | { AUTOINT1}; |
| EagerAggregation | 881 | 173 | 0 | FRESHID82, FRESHID87 | |
| Projection(1) | 776388 | 35941815 | 71883630 | FRESHID82, p | |
| Filter | 776388 | 35941815 | 0 | p | NOT(anon[38] == anon[60]) |
| Expand(All)(0) | 776388 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Expand(All)(1) | 383997 | 13345621 | 19478500 | | ()-[:Next]->() |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+-----------------------------------------+---------------------------+
Total database accesses: 146782447
Message.log
的问题
如何更快地执行此查询? (更多RAM,重构查询,分布式缓存,使用其他语言/ shell /方法,...)
UPD:
回答中最后一个查询的个人资料信息
neo4j-sh (?)$ profile match (:Page {Name:'Dashboard.aspx'})<-[:At]-()-[:Next]->()-[:At]->(p)
with p,count(*) as count
order by count desc
limit 10 return p.Name, count;
+------------------------------------------------+
| p.Name | count |
+------------------------------------------------+
| "OutgoingDocumentsList.aspx" | 5172680 |
| "DocumentPreview.aspx" | 3846455 |
| "Dashboard.aspx" | 3579022 |
| "ActualTasks.aspx" | 3051043 |
| "DocumentFillMissingRequisites.aspx" | 1713004 |
| "EditDocument.aspx" | 1373928 |
| "PaymentsList.aspx" | 1338063 |
| "Login.aspx" | 1285447 |
| "ReportingRequisites.aspx" | 884077 |
| "ContractorInfo.aspx" | 759665 |
+------------------------------------------------+
10 rows
151328 ms
Compiler CYPHER 2.2
Planner COST
Projection
|
+Top
|
+EagerAggregation
|
+Filter
|
+Expand(All)(0)
|
+Expand(All)(1)
|
+Expand(All)(2)
|
+NodeUniqueIndexSeek
+---------------------+---------------+----------+----------+------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+----------+----------+------------------+---------------------------+
| Projection | 881 | 10 | 20 | count, p, p.Name | p.Name, count |
| Top | 881 | 10 | 0 | count, p | { AUTOINT1}; count |
| EagerAggregation | 881 | 173 | 0 | count, p | p |
| Filter | 776388 | 35941815 | 0 | p | NOT(anon[38] == anon[60]) |
| Expand(All)(0) | 776388 | 35941815 | 49287436 | p | ()-[:At]->(p) |
| Expand(All)(1) | 383997 | 13345621 | 19478500 | | ()-[:Next]->() |
| Expand(All)(2) | 189923 | 6132879 | 6132880 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+----------+----------+------------------+---------------------------+
Total database accesses: 74898837
答案 0 :(得分:0)
正如我之前提到的,在你的另一个问题中,如果你能写一个Java based server extension,你可以很容易地做到。
// initialize counters
Map<Node,AtomicInteger> pageCounts = new HashMap<>(300);
for (Node page : graphDb.findNode(Page)) pageCounts.put(page,new AtomicInteger());
// find start page
Label Page = DynamicLabel.label("Page");
Node page = graphDB.findNode(Page,"Name",pageName).iterator().next();
// follow page-view relationships
for (Relationship at : page.getRelationships(At, INCOMING)) {
// follow singular next relationship
Relationship at2 = at.getStartNode().getSingleRelationship(Next,OUTGOING);
if (at2==null) continue;
// follow singular page-view relationship to end-page
Node page2 = at2.getSingleRelationship(At,OUTGOING).getEndNode();
// increment counter
pageCounts.get(page2).incrementAndGet();
}
// sort pages by count descending
List pages = new ArrayList(pageCounts.entrySet())
Collections.sort(pages,new Comparator<Map.Entry<Node,Integer>>() {
public int compare(Map.Entry<Node,Integer> e1, Map.Entry<Node,Integer> e2) {
return - Integer.compare(e1.getValue(),e2.getValue());
}
});
// return top 10
return pages.subList(0,10);
对于Cypher,我会尝试这样的事情:
match (:Page {Name:'#########.aspx'})<-[:At]-(pv:PageView)
WITH distinct pv
MATCH (pv)-[:Next]->(pv2:PageView)
with distinct pv2
match (pv2)-[:At]->(p:Page)
return p.Name,count(*) as count
order by count desc
limit 10;
我为它编写了一个测试并在我更大的linux机器上运行它,结果更加明智:Java中的 1.6s 和Cypher中的 5s max 之间。
以下是代码和结果:https://gist.github.com/jexp/94f75ddb849f8c41c97c
在Cypher:
-------------------
match (:Page {Name:'Page1'})<-[:At]-()-[:Next]->()-[:At]->(p)
return p.Name,count(*) as count
order by count desc
limit 10;
+-------------------+
| p.Name | count |
+-------------------+
| "Page169" | 975 |
| "Page125" | 959 |
| "Page106" | 955 |
| "Page274" | 951 |
| "Page176" | 947 |
| "Page241" | 944 |
| "Page30" | 942 |
| "Page44" | 938 |
| "Page1" | 938 |
| "Page118" | 938 |
+-------------------+
10 rows
in 3212 ms
[Compiler CYPHER 2.2
Planner COST
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
| Top | 488 | 10 | 0 | FRESHID71, FRESHID76 | { AUTOINT1}; |
| EagerAggregation | 488 | 300 | 0 | FRESHID71, FRESHID76 | |
| Projection | 238460 | 264828 | 529656 | FRESHID71, p | |
| Filter | 238460 | 264828 | 0 | p | NOT(anon[29] == anon[51]) |
| Expand(All)(0) | 238460 | 264828 | 529656 | p | ()-[:At]->(p) |
| Expand(All)(1) | 238460 | 264828 | 778522 | | ()-[:Next]->() |
| Expand(All)(2) | 476922 | 513694 | 513695 | | ()<-[:At]-() |
| NodeUniqueIndexSeek | 1 | 1 | 1 | | :Page(Name) |
+---------------------+---------------+--------+--------+--------------------------+---------------------------+
Total database accesses: 2351530]
在Java中:
-------------------
Java took 1618 ms
Node[169]=975
Node[125]=959
Node[106]=955
Node[274]=951
Node[176]=947
Node[241]=944
Node[30]=942
Node[1]=938
Node[44]=938
Node[118]=938
您可以采取的加速Cypher查询的方法是仅在节点上进行聚合,并且只返回最后10行的page.Name属性,速度要快得多。
match (:Page {Name:'Page1'})<-[:At]-()-[:Next]->()-[:At]->(p)
with p,count(*) as count
order by count desc
limit 10 return p.Name, count