我有一些电影节点,其属性如:流派,作家,语言(字符串数组)
我希望获得具有相似数组属性的所有节点以及相似度百分比
例如,对于流派属性:
电影1:犯罪,戏剧,神秘
电影2:剧情
33%的相似性
另外
此外,我想要一个返回所有至少有一个共同类型的电影的查询 作家,语言
我知道我必须使用collect函数,但我如何比较数组
例如: 电影1:犯罪,戏剧,神秘
Movie2:Crime,Myster
MOVIE3:Myster,喜剧
Movie4:喜剧
组别1:剧场1,电影2,MOVIE3
第2组:MOVIE3,Movie4
答案 0 :(得分:2)
您可以使用REDUCE
来计算交叉点:
WITH
['Drama','Crime','Mystery'] as genre1,
['Drama'] as genre2
WITH
genre1,
genre2,
CASE WHEN size(genre1)>size(genre2)
THEN size(genre1)
ELSE size(genre2)
END as maxSize,
REDUCE(acc=0,
genre in genre1
| acc + CASE WHEN genre in genre2 THEN 1 ELSE 0 END
) as similarity
RETURN genre1,
genre2,
100.0 * similarity / maxSize as similarity
或者您可以使用apoc.coll.intersection
中的APOC library
功能:
WITH
['Drama','Crime','Mystery'] as genre1,
['Drama'] as genre2
WITH
genre1,
genre2,
apoc.coll.max([size(genre1), size(genre2)]) as maxSize,
apoc.coll.intersection(genre1, genre2) as similarity
RETURN genre1,
genre2,
100.0 * size(similarity) / maxSize as similarity
如果您只想找到至少与一个类型相交的节点:
MATCH (M:Movie)
UNWIND M.genres as genre
WITH genre,
M
ORDER BY id(M) ASC
WITH genre,
collect(M) as movies
RETURN distinct movies as movies