Neo4j匹配具有similiraity属性的节点(数组)

时间:2017-05-01 13:09:06

标签: arrays neo4j

我有一些电影节点,其属性如:流派,作家,语言(字符串数组)

我希望获得具有相似数组属性的所有节点以及相似度百分比

例如,对于流派属性:

电影1:犯罪,戏剧,神秘

电影2:剧情

33%的相似性

另外

此外,我想要一个返回所有至少有一个共同类型的电影的查询 作家,语言

我知道我必须使用collect函数,但我如何比较数组

例如: 电影1:犯罪,戏剧,神秘

Movie2:Crime,Myster

MOVIE3:Myster,喜剧

Movie4:喜剧

组别1:剧场1,电影2,MOVIE3

第2组:MOVIE3,Movie4

1 个答案:

答案 0 :(得分:2)

您可以使用REDUCE来计算交叉点:

WITH 
  ['Drama','Crime','Mystery'] as genre1,
  ['Drama'] as genre2
WITH
  genre1, 
  genre2,
  CASE WHEN size(genre1)>size(genre2) 
       THEN size(genre1) 
       ELSE size(genre2)
  END as maxSize, 
  REDUCE(acc=0, 
         genre in genre1 
         | acc + CASE WHEN genre in genre2 THEN 1 ELSE 0 END
  ) as similarity
RETURN genre1, 
       genre2, 
       100.0 * similarity / maxSize as similarity

或者您可以使用apoc.coll.intersection中的APOC library功能:

WITH 
  ['Drama','Crime','Mystery'] as genre1,
  ['Drama'] as genre2
WITH
  genre1, 
  genre2,
  apoc.coll.max([size(genre1), size(genre2)]) as maxSize,
  apoc.coll.intersection(genre1, genre2) as similarity
RETURN genre1, 
       genre2, 
       100.0 * size(similarity) / maxSize as similarity

如果您只想找到至少与一个类型相交的节点:

MATCH (M:Movie)
UNWIND M.genres as genre
WITH genre, 
     M
     ORDER BY id(M) ASC
WITH genre, 
     collect(M) as movies
RETURN distinct movies as movies