Question

我的目标是使用neo4j对文档进行两种不同的搜索。我将使用食谱（文件）作为我的例子。说我手边有成分（关键词）（牛奶，黄油，面粉，盐，糖，鸡蛋......），我的数据库中有一些食谱，每种食谱都附有成分。我想输入我的列表并得到两个不同的结果。一个是最接近包括我输入的所有成分的食谱。第二种是食谱的组合，它们一起包括我的所有成分。

鉴于：牛奶，黄油，面粉，盐，糖，鸡蛋

第一种情况的搜索结果可能是：

1.）糖饼干

2.）黄油饼干

第二个结果可能是：

1.）扁面包和Gogel-Mogel

我正在读食谱以插入neo4j，并从每个食谱顶部的成分列表中提取成分，但也从食谱说明中提取。我想以不同的方式权衡这些，可能是60/40，而不是成分列表。

如果人们输入相似的词语，我还想阻止每种成分。

我很难在neo4j中提出一个好的数据模型。我计划用户输入英文成分，我会在后台插入它们，然后用它来搜索。

我的第一个想法是： neo4j data model 1 这对我来说很直观，但是找到所有食谱有很多麻烦。

接下来也许这个： neo4j data model 2

直接从茎中获取食谱，但我需要在关系中传递食谱ID（对吗？）以获得实际成分。

第三，也许可以这样组合起来？ neo4j data model 3 但是有很多重复。

以下是一些创建第一个想法的CYPHER声明：

//Create 4 recipes
create (r1:Recipe {rid:'1', title:'Sugar cookies'}), (r2:Recipe {rid:'2', title:'Butter cookies'}), 
(r3:Recipe {rid:'3', title:'Flat bread'}), (r4:Recipe {rid:'4', title:'Gogel-Mogel'}) 

//Adding some ingredients
merge (i1:Ingredient {ingredient:"salted butter"})
merge (i2:Ingredient {ingredient:"white sugar"})
merge (i3:Ingredient {ingredient:"brown sugar"})
merge (i4:Ingredient {ingredient:"all purpose flour"})
merge (i5:Ingredient {ingredient:"iodized salt"})
merge (i6:Ingredient {ingredient:"eggs"})
merge (i7:Ingredient {ingredient:"milk"})
merge (i8:Ingredient {ingredient:"powdered sugar"})
merge (i9:Ingredient {ingredient:"wheat flour"})
merge (i10:Ingredient {ingredient:"bananas"})
merge (i11:Ingredient {ingredient:"chocolate chips"})
merge (i12:Ingredient {ingredient:"raisins"})
merge (i13:Ingredient {ingredient:"unsalted butter"})
merge (i14:Ingredient {ingredient:"wheat flour"})
merge (i15:Ingredient {ingredient:"himalayan salt"})
merge (i16:Ingredient {ingredient:"chocolate bars"})
merge (i17:Ingredient {ingredient:"vanilla flavoring"})
merge (i18:Ingredient {ingredient:"vanilla"})

//Stems added to each ingredient
merge (i1)<-[:STEM_OF]-(s1:Stem {stem:"butter"})
merge (i2)<-[:STEM_OF]-(s2:Stem {stem:"sugar"})
merge (i3)<-[:STEM_OF]-(s2)
merge (i4)<-[:STEM_OF]-(s4:Stem {stem:"flour"})
merge (i5)<-[:STEM_OF]-(s5:Stem {stem:"salt"})
merge (i6)<-[:STEM_OF]-(s6:Stem {stem:"egg"})
merge (i7)<-[:STEM_OF]-(s7:Stem {stem:"milk"})
merge (i8)<-[:STEM_OF]-(s2)
merge (i9)<-[:STEM_OF]-(s4)
merge (i10)<-[:STEM_OF]-(s10:Stem {stem:"banana"})

merge (i11)<-[:STEM_OF]-(s11:Stem {stem:"chocolate"})
merge (i12)<-[:STEM_OF]-(s12:Stem {stem:"raisin"})
merge (i13)<-[:STEM_OF]-(s1)
merge (i14)<-[:STEM_OF]-(s4)
merge (i15)<-[:STEM_OF]-(s5)
merge (i16)<-[:STEM_OF]-(s11)
merge (i17)<-[:STEM_OF]-(s13:Stem {stem:"vanilla"})
merge (i18)<-[:STEM_OF]-(s13)


create (r1)<-[:INGREDIENTS_LIST{weight:.7}]-(i1)
create (r1)<-[:INGREDIENTS_LIST{weight:.6}]-(i2)    
create (r1)<-[:INGREDIENTS_LIST{weight:.5}]-(i4)
create (r1)<-[:INGREDIENTS_LIST{weight:.4}]-(i5)
create (r1)<-[:INGREDIENTS_LIST{weight:.4}]-(i6)
create (r1)<-[:INGREDIENTS_LIST{weight:.2}]-(i7)
create (r1)<-[:INGREDIENTS_LIST{weight:.1}]-(i18)

create (r2)<-[:INGREDIENTS_LIST{weight:.7}]-(i1)
create (r2)<-[:INGREDIENTS_LIST{weight:.6}]-(i3)    
create (r2)<-[:INGREDIENTS_LIST{weight:.5}]-(i4)
create (r2)<-[:INGREDIENTS_LIST{weight:.4}]-(i5)
create (r2)<-[:INGREDIENTS_LIST{weight:.3}]-(i6)
create (r2)<-[:INGREDIENTS_LIST{weight:.2}]-(i7)
create (r2)<-[:INGREDIENTS_LIST{weight:.1}]-(i18)

create (r3)<-[:INGREDIENTS_LIST{weight:.7}]-(i1)
create (r3)<-[:INGREDIENTS_LIST{weight:.6}]-(i5)
create (r3)<-[:INGREDIENTS_LIST{weight:.5}]-(i7)
create (r3)<-[:INGREDIENTS_LIST{weight:.4}]-(i9)

create (r4)<-[:INGREDIENTS_LIST{weight:.6}]-(i2)
create (r4)<-[:INGREDIENTS_LIST{weight:.5}]-(i6)



create (r1)<-[:INGREDIENTS_INSTR{weight:.2}]-(i1)
create (r1)<-[:INGREDIENTS_INSTR{weight:.2}]-(i2)   
create (r1)<-[:INGREDIENTS_INSTR{weight:.2}]-(i4)
create (r1)<-[:INGREDIENTS_INSTR{weight:.2}]-(i5)
create (r1)<-[:INGREDIENTS_INSTR{weight:.1}]-(i6)
create (r1)<-[:INGREDIENTS_INSTR{weight:.1}]-(i7)


create (r2)<-[:INGREDIENTS_INSTR{weight:.3}]-(i1)
create (r2)<-[:INGREDIENTS_INSTR{weight:.2}]-(i3)   
create (r2)<-[:INGREDIENTS_INSTR{weight:.2}]-(i4)
create (r2)<-[:INGREDIENTS_INSTR{weight:.2}]-(i5)
create (r2)<-[:INGREDIENTS_INSTR{weight:.2}]-(i6)
create (r2)<-[:INGREDIENTS_INSTR{weight:.1}]-(i7)


create (r3)<-[:INGREDIENTS_INSTR{weight:.3}]-(i1)
create (r3)<-[:INGREDIENTS_INSTR{weight:.3}]-(i5)
create (r3)<-[:INGREDIENTS_INSTR{weight:.1}]-(i7)
create (r3)<-[:INGREDIENTS_INSTR{weight:.1}]-(i9)

create (r4)<-[:INGREDIENTS_INSTR{weight:.3}]-(i2)
create (r4)<-[:INGREDIENTS_INSTR{weight:.3}]-(i6)

以及带有上述语句的neo4j控制台的链接： http://console.neo4j.org/?id=3o8y44

neo4j关注多重关系多少钱？此外，我可以做一个单一的成分，但是我如何组合一个查询来获得给出多种成分的食谱？

编辑：谢谢迈克尔！这让我更进一步。我能够扩展你的答案：

WITH split("egg, sugar, chocolate, milk, flour, salt",", ") as terms  UNWIND 
terms as term  MATCH (stem:Stem {stem:term})-[:STEM_OF]->
(ingredient:Ingredient)-[lst:INGREDIENTS_LIST]->(r:Recipe)  WITH r, 
size(terms) - count(distinct stem) as notCovered,  sum(lst.weight) as weight, 
collect(distinct stem.stem) as matched  RETURN r , notCovered,matched, weight 
ORDER BY notCovered ASC, weight DESC

并获得匹配的成分列表和重量。如何更改查询以显示：INGREDIENTS_INSTR关系的权重，以便我可以同时使用这两个权重进行计算？ [lst：INGREDIENTS_LIST | INGREDIENTS_INSTR]不是我想要的。

编辑：

这似乎有效，是不是正确？

WITH split("egg, sugar, chocolate, milk, flour, salt",", ") as terms   UNWIND 
terms as term   MATCH (stem:Stem {stem:term})-[:STEM_OF]->
(ingredient:Ingredient)-[lstl:INGREDIENTS_LIST]->(r:Recipe)<-
[lsti:INGREDIENTS_INSTR]-(ingredient:Ingredient) WITH r, size(terms) - 
count(distinct stem) as notCovered,  sum(lsti.weight) as wi, sum(lstl.weight) 
as wl, collect(distinct stem.stem) as matched   RETURN r , 
notCovered,matched, wl+wi ORDER BY notCovered ASC, wl+wi DESC

另外，你能帮忙解决第二个问题吗？在给出成分列表的情况下，将返回包含给定成分的配方组合。再次感谢！

Answer 1

我会选择你的版本1）。

不要担心额外的啤酒花。您可以将有关量/重量的信息放在配方和实际成分之间的关系上。

您可以拥有多种关系。

以下是一个示例查询，它不适用于您的数据集，因为您没有包含所有成分的配方：

WITH split("milk, butter, flour, salt, sugar, eggs",", ") as terms 
UNWIND terms as term 
MATCH (stem:Stem {stem:term})-[:STEM_OF]->(ingredient:Ingredient)-->(r:Recipe) 
WITH r, size(terms) - count(distinct stem) as notCovered 
RETURN r ORDER BY notCovered ASC LIMIT 2

+-----------------------------------------+
| r                                       |
+-----------------------------------------+
| Node[0]{rid:"1",title:"Sugar cookies"}  |
| Node[1]{rid:"2",title:"Butter cookies"} |
+-----------------------------------------+
2 rows

以下是大型数据集的优化：

对于查询，你会首先找到所有成分，然后食谱附有最具选择性的食谱（最低程度）。

然后检查每种食谱的剩余成分。

WITH split("milk, butter, flour, salt, sugar, eggs",", ") as terms 
MATCH (stem:Stem) WHERE stem.stem IN terms
// highest selective stem first
WITH stem, terms ORDER BY size((stem)-[:STEM_OF]->()) ASC
WITH terms, collect(stem) as stems
WITH head(stems) first, tail(stems) as rest, terms
MATCH (first)-[:STEM_OF]->(ingredient:Ingredient)-->(r:Recipe) 
WHERE size[other IN rest WHERE (other)-[:STEM_OF]->(:Ingredient)-->(r)] as covered
WITH r, size(terms) - 1 - covered as notCovered 
RETURN r ORDER BY notCovered ASC LIMIT 2

Neo4j用于搜索的文档，关键词和词干的数据模型

1 个答案: