在Neo4j图形数据库中评分复杂匹配期间的表现?

时间:2018-04-24 09:14:52

标签: neo4j cypher graph-databases

我有Neo4j 3.3.5图形数据库:27GB,50kk节点,500kk关系。索引。 Schema。 PC:16GB内存,4个内核。

任务是找到给定查询数据的最佳匹配公司。 节点:公司,我需要得到的,与节点有多种关系:Branch,:Country's等。 查询数据有BranchIds,CountryIds等

目前我正在使用这样的密码从一个关系中获得分数(结果为500k行):

MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in [27444, 1692, 23409, ...] //around 10 ids per query
RETURN 
c.companyId as Id, 
case r.branchType 
 when 0 then 25
 ... // //around 7 conditions per query 
 when 10 then 20 
end as Score

我必须得到以下所有关系类型:公司,Id分组,总和Score,订单并获得前100名结果。

由于缺少联盟后处理,我使用collect + unwind来合并所有关系的分数。

不幸的是,性能很低。我在5-10秒内得到一个关系(如上所述)的查询响应。当我尝试将结果与collect + unwind结合使用时,查询“永不”结束。

有什么更好/正确的方法呢?也许我在图形设计上做错了什么?硬件配置要低吗?或者也许有一些算法可以与图表数据库中的得分图(查询数据)相匹配?

更新

查询说明:

用户可以在我们的系统中搜索公司。对于他的查询,我们准备查询数据包含分支,国家,单词等的ID。 在查询结果中,我们希望得到最佳匹配公司ID列表。

E.g。用户可以搜索从西班牙生产木桌的新公司。

综合查询示例:

MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
WITH case r.branchType 
when "0" then collect({id:c.companyId, score: 25}) 
 when "1" then collect({id:c.companyId, score: 19}) 
 when "2" then collect({id:c.companyId, score: 20}) 
 when "3" then collect({id:c.companyId, score: 19}) 
 when "4" then collect({id:c.companyId, score: 20}) 
 when "5" then collect({id:c.companyId, score: 15}) 
 when "6" then collect({id:c.companyId, score: 6}) 
 when "7" then collect({id:c.companyId, score: 5}) 
 when "8" then collect({id:c.companyId, score: 4}) 
 when "9" then collect({id:c.companyId, score: 4}) 
 when "10" then collect({id:c.companyId, score: 20}) 
end as rows
MATCH (c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
WITH rows + case r.branchType 
when "0" then collect({id:c.companyId, score: 25}) 
 when "1" then collect({id:c.companyId, score: 19}) 
 when "2" then collect({id:c.companyId, score: 20}) 
 when "3" then collect({id:c.companyId, score: 19}) 
 when "10" then collect({id:c.companyId, score: 20}) 
end as rows
MATCH (c:Company)-[r:HAS_COUNTRY]->(cou:Country)
WHERE cou.countryId in ["9580" , "18551" , "15895"] 
WITH rows + case r.branchType 
when "0" then collect({id:c.companyId, score: 30}) 
 when "2" then collect({id:c.companyId, score: 15}) 
 end as rows
... //here I would add in future other relations scoring
UNWIND rows AS row
RETURN row.id AS Id, sum(row.score) AS Score
ORDER BY Score DESC
LIMIT 100

2 个答案:

答案 0 :(得分:1)

您可以尝试此查询,看看它是否更好:

MATCH (c:Company) WITH c
OPTIONAL MATCH (c)-[r1:HAS_BRANCH]->(b:Branch) WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
OPTIONAL MATCH (c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch) WHERE c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
OPTIONAL MATCH (c)-[r3:HAS_COUNTRY]->(cou:Country) WHERE cou.countryId in ["9580" , "18551" , "15895"] 
WITH c, 
    case r1.branchType 
      when "0" then 25
      when "1" then 19 
      when "2" then 20 
      when "3" then 19 
      when "4" then 20 
      when "5" then 15 
      when "6" then 6 
      when "7" then 5 
      when "8" then 4 
      when "9" then 4 
      when "10" then 20 
    end as branchScore,
    case r2.branchType 
      when "0" then  25 
      when "1" then  19 
      when "2" then  20 
      when "3" then  19 
      when "10" then  20 
    end as revertedBranchScore,
    case r3.branchType 
      when "0" then  30
      when "2" then  15 
    end as countryScore

WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
RETURN Id, sum(Score) AS Score
ORDER BY Score DESC
LIMIT 100

或者更好一个是这个(但只有当Company节点强制关联到CountryBranch时才会这样做:

MATCH 
  (c:Company)-[r1:HAS_BRANCH]->(b:Branch),
  (c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch),
  (c)-[r3:HAS_COUNTRY]->(cou:Country)
WHERE 
  b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND 
  c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND
  cou.countryId in ["9580" , "18551" , "15895"]
WITH c, 
    case r1.branchType 
      when "0" then 25
      when "1" then 19 
      when "2" then 20 
      when "3" then 19 
      when "4" then 20 
      when "5" then 15 
      when "6" then 6 
      when "7" then 5 
      when "8" then 4 
      when "9" then 4 
      when "10" then 20 
    end as branchScore,
    case r2.branchType 
      when "0" then  25 
      when "1" then  19 
      when "2" then  20 
      when "3" then  19 
      when "10" then  20 
    end as revertedBranchScore,
    case r3.branchType 
      when "0" then  30
      when "2" then  15 
    end as countryScore

WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
RETURN Id, sum(Score) AS Score
ORDER BY Score DESC
LIMIT 100

答案 1 :(得分:0)

让我们看看我们是否可以通过使用模式理解和reduce()函数来保持匹配基数,以便在查询过程中更新每个公司的得分,以及等到结束时调出id属性:< / p>

MATCH (c:Company)
WITH c, [(c)-[r:HAS_BRANCH]->(b:Branch) 
 WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] | r.branchType] as hasBranchTypes
WITH c, reduce(runningScore = 0, type in hasBranchTypes | runningScore + 
 case type 
 when "0" then 25
 when "1" then 19
 when "2" then 20 
 when "3" then 19 
 when "4" then 20 
 when "5" then 15 
 when "6" then 6 
 when "7" then 5 
 when "8" then 4 
 when "9" then 4 
 when "10" then 20 
 end ) as score

WITH c, score, [(c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
 WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] | r.branchType] as revertedBranchTypes
WITH c, reduce(runningScore = score, type in revertedBranchTypes | runningScore + 
 case type
 when "0" then 25
 when "1" then 19 
 when "2" then 20 
 when "3" then 19 
 when "10" then 20 
end ) as score

WITH c, score, [(c:Company)-[r:HAS_COUNTRY]->(cou:Country)
 WHERE cou.countryId in ["9580" , "18551" , "15895"] | r.branchType] as hasCountryTypes
WITH c, reduce(runningScore = score, type in hasCountryTypes | runningScore + 
 case type
 when "0" then 30 
 when "2" then 15 
 end ) as score
 //here I would add in future other relations scoring

WITH c, score
ORDER BY score DESC
LIMIT 100
RETURN c.id as Id, score as Score