我有Neo4j 3.3.5图形数据库:27GB,50kk节点,500kk关系。索引。 Schema。 PC:16GB内存,4个内核。
任务是找到给定查询数据的最佳匹配公司。 节点:公司,我需要得到的,与节点有多种关系:Branch,:Country's等。 查询数据有BranchIds,CountryIds等
目前我正在使用这样的密码从一个关系中获得分数(结果为500k行):
MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in [27444, 1692, 23409, ...] //around 10 ids per query
RETURN
c.companyId as Id,
case r.branchType
when 0 then 25
... // //around 7 conditions per query
when 10 then 20
end as Score
我必须得到以下所有关系类型:公司,Id
分组,总和Score
,订单并获得前100名结果。
由于缺少联盟后处理,我使用collect
+ unwind
来合并所有关系的分数。
不幸的是,性能很低。我在5-10秒内得到一个关系(如上所述)的查询响应。当我尝试将结果与collect
+ unwind
结合使用时,查询“永不”结束。
有什么更好/正确的方法呢?也许我在图形设计上做错了什么?硬件配置要低吗?或者也许有一些算法可以与图表数据库中的得分图(查询数据)相匹配?
更新
查询说明:
用户可以在我们的系统中搜索公司。对于他的查询,我们准备查询数据包含分支,国家,单词等的ID。 在查询结果中,我们希望得到最佳匹配公司ID列表。
E.g。用户可以搜索从西班牙生产木桌的新公司。
综合查询示例:
MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"]
WITH case r.branchType
when "0" then collect({id:c.companyId, score: 25})
when "1" then collect({id:c.companyId, score: 19})
when "2" then collect({id:c.companyId, score: 20})
when "3" then collect({id:c.companyId, score: 19})
when "4" then collect({id:c.companyId, score: 20})
when "5" then collect({id:c.companyId, score: 15})
when "6" then collect({id:c.companyId, score: 6})
when "7" then collect({id:c.companyId, score: 5})
when "8" then collect({id:c.companyId, score: 4})
when "9" then collect({id:c.companyId, score: 4})
when "10" then collect({id:c.companyId, score: 20})
end as rows
MATCH (c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"]
WITH rows + case r.branchType
when "0" then collect({id:c.companyId, score: 25})
when "1" then collect({id:c.companyId, score: 19})
when "2" then collect({id:c.companyId, score: 20})
when "3" then collect({id:c.companyId, score: 19})
when "10" then collect({id:c.companyId, score: 20})
end as rows
MATCH (c:Company)-[r:HAS_COUNTRY]->(cou:Country)
WHERE cou.countryId in ["9580" , "18551" , "15895"]
WITH rows + case r.branchType
when "0" then collect({id:c.companyId, score: 30})
when "2" then collect({id:c.companyId, score: 15})
end as rows
... //here I would add in future other relations scoring
UNWIND rows AS row
RETURN row.id AS Id, sum(row.score) AS Score
ORDER BY Score DESC
LIMIT 100
答案 0 :(得分:1)
您可以尝试此查询,看看它是否更好:
MATCH (c:Company) WITH c
OPTIONAL MATCH (c)-[r1:HAS_BRANCH]->(b:Branch) WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"]
OPTIONAL MATCH (c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch) WHERE c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"]
OPTIONAL MATCH (c)-[r3:HAS_COUNTRY]->(cou:Country) WHERE cou.countryId in ["9580" , "18551" , "15895"]
WITH c,
case r1.branchType
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "4" then 20
when "5" then 15
when "6" then 6
when "7" then 5
when "8" then 4
when "9" then 4
when "10" then 20
end as branchScore,
case r2.branchType
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "10" then 20
end as revertedBranchScore,
case r3.branchType
when "0" then 30
when "2" then 15
end as countryScore
WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
RETURN Id, sum(Score) AS Score
ORDER BY Score DESC
LIMIT 100
或者更好一个是这个(但只有当Company
节点强制关联到Country
和Branch
时才会这样做:
MATCH
(c:Company)-[r1:HAS_BRANCH]->(b:Branch),
(c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch),
(c)-[r3:HAS_COUNTRY]->(cou:Country)
WHERE
b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND
c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND
cou.countryId in ["9580" , "18551" , "15895"]
WITH c,
case r1.branchType
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "4" then 20
when "5" then 15
when "6" then 6
when "7" then 5
when "8" then 4
when "9" then 4
when "10" then 20
end as branchScore,
case r2.branchType
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "10" then 20
end as revertedBranchScore,
case r3.branchType
when "0" then 30
when "2" then 15
end as countryScore
WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
RETURN Id, sum(Score) AS Score
ORDER BY Score DESC
LIMIT 100
答案 1 :(得分:0)
让我们看看我们是否可以通过使用模式理解和reduce()函数来保持匹配基数,以便在查询过程中更新每个公司的得分,以及等到结束时调出id属性:< / p>
MATCH (c:Company)
WITH c, [(c)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] | r.branchType] as hasBranchTypes
WITH c, reduce(runningScore = 0, type in hasBranchTypes | runningScore +
case type
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "4" then 20
when "5" then 15
when "6" then 6
when "7" then 5
when "8" then 4
when "9" then 4
when "10" then 20
end ) as score
WITH c, score, [(c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] | r.branchType] as revertedBranchTypes
WITH c, reduce(runningScore = score, type in revertedBranchTypes | runningScore +
case type
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "10" then 20
end ) as score
WITH c, score, [(c:Company)-[r:HAS_COUNTRY]->(cou:Country)
WHERE cou.countryId in ["9580" , "18551" , "15895"] | r.branchType] as hasCountryTypes
WITH c, reduce(runningScore = score, type in hasCountryTypes | runningScore +
case type
when "0" then 30
when "2" then 15
end ) as score
//here I would add in future other relations scoring
WITH c, score
ORDER BY score DESC
LIMIT 100
RETURN c.id as Id, score as Score