Jaccard相似性如何使用writeRelationshipType

时间:2018-12-21 14:34:24

标签: neo4j cypher py2neo

我正在尝试根据“ Jaccard相似性”临界点推荐关键字。最终目标是使用py2neo并在用户需要推荐的关键字时调用此查询。 我的理由是:

(Title1)-[:HAS_KEYWORDS]->(Keyword1)<-[:HAS_KEYWORDS]-(Title2)-[:HAS_KEYWORDS]->(Keyword2)

我遵循了手册中的示例:
https://neo4j.com/docs/graph-algorithms/current/algorithms/similarity-jaccard/
我的数据表示如下:测试数据csv文件的表示如下:用于创建所有标题节点的CSV:

title_id,title  
T1,Article Title 1  
T2,Article Title 2 
我要用于创建关系的

CSV:

title_id,keyword_id,keyword  
T1,K1,aaa  
T1,K2,bbb  
T1,K3,ccc  
T1,K4,ddd  
T2,K1,aaa  
T2,K5,eee  
T2,K6,fff  
T2,K4,ddd  

我目前正在计算相似度:

我尝试了以下方法:

MATCH (search_query:Title)-[:HAS_KEYWORDS]->(k_id:Keyword)
<-[:HAS_KEYWORDS]-(return_query:Title)-[r2:HAS_KEYWORDS]->(rec_k:Keyword)  
WITH {item:id(return_query), categories: collect(id(rec_k))} as userData  
WITH collect(userData) as data  
CALL algo.similarity.jaccard.stream(data, {similarityCutoff: 0.0})  
YIELD item1, item2, count1, count2, intersection, similarity  
RETURN algo.getNodeById(item1) AS from, algo.getNodeById(item2) AS to,  intersection, similarity ORDER BY similarity DESC  

但是,在继续阅读该示例时,该示例使用了另一个查询,我也尝试复制该查询:

MATCH (search_query:Title)
  -[:HAS_KEYWORDS]->(k_id:Keyword)
 <-[:HAS_KEYWORDS]-(return_query:Title)
  -[r2:HAS_KEYWORDS]->(rec_k:Keyword)     
WITH {item:id(return_query), categories: collect(id(rec_k))} as userData 
WITH collect(userData) as data  
CALL algo.similarity.jaccard(data, {topK: 1, similarityCutoff: 0.0, write:true})  
YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, stdDev, p25, p50, p75, p90, p95, p99, p999, p100  
RETURN nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, p95  

我正在尝试下一步并查询类似关系,
但是在检查结果时,我发现测试图中尚未创建相似关系。 因此,我的第一个问题是:问:为什么我的图中没有出现类似关系?
(一个相关的子问题:我相信我的MATCH逻辑将搜索具有至少一个公共关键字和另一个标题的所有标题,另一个标题也必须具有至少一个其他不相关的关键字。如果使用第二个示例,将会我只能建立一个类似的关系吗?)

我的第二个问题与我的最终目标有关。问:如果我正确理解查询,那么只有最相似的结果才会在数据库中存储类似关系;我可以在函数内部使用相同的查询吗?目前,我的函数如下所示:

def get_similar_keywords(self):
    '''
    MATCH (search_query:Title)
          -[:HAS_KEYWORDS]->(k_id:Keyword)
         <-[:HAS_KEYWORDS]-(return_query:Title)
          -[r2:HAS_KEYWORDS]->(rec_k:Keyword)
    WITH {item:id(return_query), categories: collect(id(rec_k))} as userData
    WITH collect(userData) as data
    CALL algo.similarity.jaccard(data, {topK: 1, similarityCutoff: 0.0, write:true})
    YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, stdDev, p25, p50, p75, p90, p95, p99, p999, p100
    RETURN nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, p95
    '''
    return graph.run(query, username=self.username)

现在,我的目标是找出:1.如果MATCH条件背后的想法不正确; 2.如何使用写关系类型创建SIMILAR关系,并3.查找这些查询是否可以重用。

当前,在使用变量之后。我认为我的Jaccard相似性值看起来正确:

  

╒═══════╤═════════════════╤═══════╤════════════ ═══════════╤═══════════════╤═>══════════════════╤═ ═════════════════╤══════════════════╤═════════════ ═════╕   │“节点”│“相似性对”│“写入”│“ writeRelationshipType”│“ writeProperty”│“> min”│“ max”│“ mean”│“ p95”│   ╞═══════╪═════════════════╪═══════╪═══════════════ ════════╪═══════════════╪═>══════════════════╪════ ══════════════╪══════════════════╪════════════════ ══╡   │7│5│false│“类似”│“分数”>│0.01162785291671753│0.5844191908836365│0.2831512808799744│0.584419190883636>5│   ───────┴────────────────────────┴─────────── ────────┴────────────────>──>──────────────┴ ──────────────┴──────────────┴ ──┘
  我只是不太明白为什么它显示为“类似”,但图形上却没有任何显示...

如果我走的路正确,我想复制以下代码:

MATCH (p:Person {name: "Praveena"})-[:SIMILAR]->(other),
      (other)-[:LIKES]->(cuisine)  
WHERE not((p)-[:LIKES]->(cuisine))  
RETURN cuisine.name AS cuisine  

...并通过py2neo返回推荐的关键字。

非常感谢您

埃里克

1 个答案:

答案 0 :(得分:0)

如果要写回 SIMILAR 关系,则必须使用similarityCutoff: 0.1或更高版本。检查source code,以了解更多原因。

此外,您的MATCH查询有点差,因此回写查询应类似于:

MATCH (search_query:Title)-[:HAS_KEYWORDS]->(k_id:Keyword)

WITH {item:id(search_query), categories: collect(id(k_id))} as userData
WITH collect(userData) as data
CALL algo.similarity.jaccard(data, {topK: 1, similarityCutoff: 0.1, write:true})
YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, stdDev, p25, p50, p75, p90, p95, p99, p999, p100
RETURN nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, p95

您将标题的ID输入为item,并将所有将标题描述为类别的关键字的ID,然后由算法处理其余部分。

现在您已经存储了关系,可以执行推荐查询。

MATCH (p:Title {name: "T1"})-[:SIMILAR]->(other),
      (other)-[:HAS_KEYWORDS]->(keyword)  
WHERE not((p)-[:HAS_KEYWORDS]->(keyword))  
RETURN keyword.name AS keywords