Neo4j查询性能标签搜索

时间:2018-05-08 17:13:58

标签: neo4j cypher

我想在某些单词(标签)上过滤消息。

现在只有我想要拥有2个单词的消息。

我们制作了一个时间树,以便我们不必搜索所有邮件。在最好的情况下,我不会搜索一个月(30天)。

本月有57.371条消息。

PROFILE
MATCH (startleaf:Hour{hash: '2018/04/01/05'})
, (endleaf:Hour{hash: '2018/04/30/05'})
, p = shortestPath((startleaf)-[:NEXT*0..]->(endleaf))
UNWIND nodes(p) AS leaf
MATCH (leaf)<-[:SENDED]-(message:TS_P2000Message)
WITH distinct message
MATCH (message)-[:HAS_WORD]->(TS_Word { name:'someren'})
WITH distinct message AS message
MATCH (message)-[:HAS_WORD]->(TS_Word { name:'kruisbaan'})
WITH distinct message AS message
WITH count(message) AS results, collect(message) AS messages
UNWIND(messages) AS message
WITH results, message AS message
SKIP 0 LIMIT 15
RETURN results, message

Cypher version: CYPHER 3.3, planner: COST, runtime: INTERPRETED. 1065560 total db hits in 2244 ms.

view profile

当我想要没有单词过滤器的所有消息时,查询会更快!

PROFILE
MATCH (startleaf:Hour{hash: '2018/04/01/05'})
, (endleaf:Hour{hash: '2018/04/30/05'})
, p = shortestPath((startleaf)-[:NEXT*0..]->(endleaf))
UNWIND nodes(p) AS leaf
MATCH (leaf)<-[:SENDED]-(message:TS_P2000Message)
WITH distinct message
WITH count(message) AS results, collect(message) AS messages
UNWIND(messages) AS message
WITH results, message AS message
SKIP 0 LIMIT 15
RETURN results, message

Cypher version: CYPHER 3.3, planner: COST, runtime: INTERPRETED. 115167 total db hits in 268 ms.

当我们将查询更改为一周时,它非常快,但为了获得最佳结果,我想要一个月。

那么我该怎样做才能让这个查询更快一点?

也许这会有帮助,这个屏幕显示结构。 Preview

编辑:

当我不使用单词并使用正则表达式创建WHERE时更快......

PROFILE
MATCH (startleaf:Hour{hash: '2018/04/01/05'})
, (endleaf:Hour{hash: '2018/04/30/05'})
, p = shortestPath((startleaf)-[:NEXT*0..]->(endleaf))
UNWIND nodes(p) AS leaf
MATCH (leaf)<-[:SENDED]-(message:TS_P2000Message)
WHERE message.message =~ '(?i).*someren.*' AND message.message =~ '(?i).*kruisbaan.*'
WITH count(message) AS results, collect(message) AS messages
UNWIND(messages) AS message
WITH results, message AS message
SKIP 0 LIMIT 15
RETURN results, message

Cypher version: CYPHER 3.3, planner: COST, runtime: INTERPRETED. 115186 total db hits in 342 ms.

2 个答案:

答案 0 :(得分:0)

您可以尝试此查询:

MATCH  p = shortestPath((startleaf:Hour{hash: '2018/04/01/05'})-[:NEXT*0..]->(endleaf:Hour{hash: '2018/04/30/05'}))
WITH NODES(p) AS dates
  MATCH (message:TS_P2000Message)-[:SENDED]->(leaf),
        (message)-[:HAS_WORD]->(TS_Word)
  WHERE leaf IN dates AND
        message.name IN ['kruisbaan', 'someren']
  WITH distinct message AS message
  WITH count(message) AS results, collect(message) AS messages
  UNWIND(messages) AS message
  WITH results, message AS message
  SKIP 0 LIMIT 15
  RETURN results, message

此外,您是否可以使用:TS_P2000Message(name)

上的索引来尝试此查询

答案 1 :(得分:0)

您忘记了:TS_Word前面标签的冒号 您应该在:TS_Word(name)

上有一个索引

我认为发送的字词比您的时间过滤器更具限制性。

所以我会这样做:

MATCH (message:TS_P2000Message)-[:HAS_WORD]->(:TS_Word { name:'someren'}),
      (message)-[:HAS_WORD]->(:TS_Word { name:'kruisbaan'})
MATCH (leaf:Hour)<-[:SENDED]-(message)
WHERE '2018/04/01/05' <= leaf.hash <= '2018/04/30/05'
WITH count(message) AS results, collect(message) AS messages
UNWIND messages AS message
RETURN results, message
SKIP 0 LIMIT 15