我正在尝试在Neo4j中进行一些文本分析,我想编写一个查询,在查询中它以降序对结果数进行排序。我的数据结构如下:
(Word)->[next]->(Word)->[Next]
等
我想写一个查询,说明哪个是最流行的3个单词组合,4个单词组合等。我尝试过此操作,但对于单词组合,它始终给出一个计数:
MATCH p = (w1:Word)-[r:NEXT]->(w2:Word)-[r2:NEXT]->(w3:Word)
WITH [w1.name,w2.name,w3.name] AS word_pair
RETURN COUNT(word_pair) as frequency, word_pair
ORDER BY frequency DESC
LIMIT 50
答案 0 :(得分:0)
模式的频率始终为1,因为您将有关模式的信息打包在关系的count
属性中。因此,您无需计算模式的出现次数,而只需找到此属性的最小值即可:
示例数据:
UNWIND ["My cat eats fish on Saturday",
"My Cat eats cat food on Saturdays"] AS text
WITH split(tolower(text)," ") AS words
UNWIND range(0,size(words)-2) AS i
MERGE (w1:Word {name: words[i]})
ON CREATE SET w1.count = 1
ON MATCH SET w1.count=w1.count+1
MERGE (w2:Word {name: words[i+1]})
ON CREATE SET w2.count = 1
ON MATCH SET w2.count=w2.count+1
MERGE (w1)-[r:NEXT]->(w2)
ON CREATE SET r.count = 1
ON MATCH SET r.count=r.count+1;
查询:
MATCH p = (:Word)-[:NEXT*2]->(:Word)
WITH extract(n IN nodes(p) | n.name) AS word_pair,
extract(r IN relationships(p) | r.count) AS counts
UNWIND counts AS count
RETURN word_pair,
min(count) AS frequency
ORDER BY frequency DESC
LIMIT 50;