Question

我有一个简单的数据库，我用它来分析特定组中的Twitter数据。

数据模型是：

(:Person)-[:TWEETS_TO]->(:Twitter_Account)

和

(:Twitter_Account)-[:FOLLOWS]->(:Twitter_Account)

只有500多个(:Person)个节点，但大约有500,000个(:Twitter_Account)个节点。换句话说，大多数(:Twitter_Account)都没有与人联系。

我想计算以下关系的数量，但仅限于与人们关联的500个左右的Twitter帐户。在我周围搜索时，我发现这个neo4j blog post和SO post提出了类似的查询：

MATCH (p:Person)-[:TWEETS_TO]->(t1:Twitter_Account)
WITH t1, 
size((t1)-[:FOLLOWS]->(:Twitter_Account)<-[:TWEETS_TO]-(:Person)) 
AS following
RETURN t1, following ORDER BY following LIMIT 5

分析给出：

Cypher版本：CYPHER 3.2，计划者：COST，运行时间：解释。在1356毫秒内总分数为2938092。

正如您所看到的，它相对较快，但我的直觉说应该有办法编写查询而不需要这么多数据库命中，因为我们只查看容易定义的一小部分数据。我尝试过的所有其他内容（例如首先匹配两个Twitter帐户）都会导致笛卡尔产品比上面的更慢。

有没有办法在不查看每个推特账户的情况下统计这些关系？

Answer 1

您可能需要考虑为连接到人的Twitter_Accounts添加单独的标签，以便稍后更轻松地查询。

MATCH (t:Twitter_Account)
WHERE exists(()-[:TWEETS_TO]->(t))
SET t:Connected_Account

如果您的图表需要处理更新，那么您需要确保添加新帐户以检查是否已连接人员并相应地添加标签。

一旦到位，您的查询将在以后变为：

MATCH (t1:Connected_Account)
WITH t1, size((t1)-[:FOLLOWS]->(:Connected_Account)) as following
RETURN t1, following 
ORDER BY following 
LIMIT 5

如果只有500个：Connected_Account节点，那么这应该会大幅减少db命中并加快查询速度。

Answer 2

您只需要对Twitter_Account个节点（任何Person拥有＆＃34;所有MATCH (:Person)-[:TWEETS_TO]->(t1:Twitter_Account) WITH COLLECT(t1) AS accts UNWIND accts AS acct OPTIONAL MATCH (acct)-[:FOLLOWS]->(t2) WHERE t2 IN accts RETURN acct, COUNT(t2) AS following ORDER BY following LIMIT 5）进行数据库搜索。

例如：

Twitter_Account

在此查询中，我们找到了所有{＆lt; {34}}所拥有的Person个节点。按accts，并将该集合保留在UNWIND中。然后我们t2该集合查找每个拥有帐户（acct）后面有多少个拥有帐户（acct）。最后，我们返回每个拥有的OPTIONAL MATCH以及它所拥有的拥有帐户的数量。（如果您只想返回至少拥有一个拥有帐户的自有帐户，请将MATCH替换为#include <time.h> #include <stdio.h> typedef enum {STOPPED, COUNTING_UP, COUNTING_DOWN} state; int main() { /* declare variables */ state my_state; time_t time1, time2; char c; double time_difference, time_elapsed; /* initialize variables */ time_elapsed = 0; my_state = STOPPED; /* show info */ printf("Press 'u' to start counting up or 'd' to start counting down.\nPress 'q' to end program.\n\n"); while(1) { c = getchar(); time1 = time2; time2 = time(0); if (c == 'u' && my_state == STOPPED) { my_state = COUNTING_UP; printf("Started counting up.\n"); } else if (c == 'd' && my_state == STOPPED) { my_state = COUNTING_DOWN; printf("Started counting down.\n"); } else if (c == 's' && my_state == COUNTING_UP) { my_state = STOPPED; time_difference = difftime(time2, time1); time_elapsed += time_difference; printf("Stopped counting up. Counted up for %.0f seconds.\n", time_difference); } else if (c == 's' && my_state == COUNTING_DOWN) { my_state = STOPPED; time_difference = difftime(time2, time1); time_elapsed -= time_difference; printf("Stopped counting down. Counted down for %.0f seconds.\n", time_difference); } else if (c == 'q') { printf("Program is closing.\n"); break; } else { continue; } /* show info */ printf("Total count: %.0f seconds\n", time_elapsed); if (my_state == STOPPED) { printf("Press 'u' to start counting up or 'd' to start counting down.\nPress 'q' to end program.\n\n"); } else if (my_state == COUNTING_UP) { printf("Press 's' to stop counting up.\nPress 'q' to end program.\n\n"); } else if (my_state == COUNTING_DOWN) { printf("Press 's' to stop counting down.\nPress 'q' to end program.\n\n"); } } return 0; }）。

降低Cypher Query的成本

2 个答案: