我的数据模型非常简单:
(n:User)-[:WANTS]->(c:Card)<-[:HAS]-(o:User)
每当用户更新其wants
列表中的卡片时,我都会向在其列表中也有该卡片的用户创建outgoing :FOLLOWS
个连接。同时,我还要从用户incoming :FOLLOWS
列表中需要卡片的用户创建have
个连接,如下所示:
// update my total wants
MATCH (u:User)-[w:WANTS]-()
WHERE u.id = 1
WITH u, SUM(w.qty) AS wqty
SET u.wqty = wqty
RETURN wqty;
// delete all my incoming and outgoing follows
MATCH (u1:User {id: 1})-[f:FOLLOWS]-() DELETE f;
// outgoing follows
MATCH (u1:User)-[w:WANTS]->(c:Card)<-[h:HAS]-(u2:User)
WHERE u1.id = 1 AND u1.id <> u2.id
WITH u1, u2, (CASE WHEN h.qty > w.qty THEN w.qty ELSE h.qty END) AS haves
WITH u1.id AS id1, u2.id AS id2, SUM(haves) as weight
MATCH (uf:User), (ut:User)
WHERE uf.id = id1 AND ut.id = id2
MERGE (uf)-[f:FOLLOWS {weight: weight}]->(ut)
ON MATCH SET f.weight = weight;
// incoming follows
MATCH (u1:User)-[h:HAS]->(c:Card)<-[w:WANTS]-(u2:User)
WHERE u1.id = 1 AND u1.id <> u2.id
WITH u1, u2, (CASE WHEN h.qty > w.qty THEN w.qty ELSE h.qty END) AS haves
WITH u1.id AS id1, u2.id AS id2, SUM(haves) as weight
MATCH (uf:User), (ut:User)
WHERE uf.id = id1 AND ut.id = id2
MERGE (uf)<-[f:FOLLOWS {weight: weight}]-(ut)
ON MATCH SET f.weight = weight;
我决定在每次用户更新其库存中的内容时都包含这种硬编码的:FOLLOWS
关系,因为我尝试根据他们的卡查询交易潜力并且查询非常昂贵。通过这种方式,用户可以通过执行以下查询来检查交易潜力:
MATCH (u1:User {id: 1})-[f1:FOLLOWS]->(u2:User)-[f2:FOLLOWS]->u1
RETURN u2.id, f1.weight AS num_cards_i_need, f2.weight AS num_cards_they_need
对于我的测试数据库来说,这种方法非常快,后者只有1个用户计算的传入/传出跟随关系。
现在问题。我有少量节点:50k users
和14k cards
。但是,每个用户平均跟随30k其他用户,大约有15亿个关系。在我将它加载到neo4j之后,预计这个数据存储大约为20-30GB。
我的问题是,我是否需要能够将整个数据库加载到内存中以实现快速读取以及快速和频繁地写入以下关系?假设我没有资源租用一些大型内存实例亚马逊而且我只限于传统的服务器硬件,我需要做哪些优化才能快速读写:FOLLOWS
? / p>
我显然对nodestore有内存,我也有一些内存用于users-&gt; cards关系的关系存储,但不是用户 - &gt;用户关系。我可以选择哪些加载到内存中,所以实际上它们是“温暖的”吗?