我具有以下结构的数据:
{"id": "1", "name": "A. I. Lazarev", "org": "United States Department of State", "tags": [{"t": "Infrared"}, {"t": "Near-infrared spectroscopy"}, {"t": "Infrared astronomy"}, {"t": "Data collection"}], "pubs": [{"i": "1542417502", "r": 6}], }
{"id": "2", "name": "Stevan Spremo", "tags": [{"t": "Micro-g environment"}, {"t": "Antibiotics"}, {"t": "Bacteriology"}], "pubs": [{"i": "222163962", "r": 0}], }
{"id": "3", "name": "Bricchi G", "pubs": [{"i": "2417067698", "r": 1}, {"i": "2406980973", "r": 1}]}
某些行具有标签,某些行具有组织,有些行同时具有,有些行则没有。
我想添加(1)作者和标签,(2)作者和组织以及(3)作者和出版物之间的关系。我已经将出版物作为节点,因此一旦获得(1)和(2),就应该很容易获得(3)。
我一直在尝试使用以下代码:
CALL apoc.periodic.iterate(
"CALL apoc.load.json('file:/test.txt') YIELD value AS q RETURN q",
"UNWIND q.id as id
CREATE (a:Author {id:id, name:q.name, citations:q.n_citation, publications:q.n_pubs})
WITH q, a
UNWIND q.tags as tags
MERGE (t:Tag {{name: tags.t}})
CREATE (a)-[:HAS_TAGS]->(t)
WITH q, a
WHERE q.org is not null
MERGE (o:Organization {name: q.org})
CREATE (a)-[:AFFILIATED_WITH]->(o)",
{batchSize:10000, iterateList:true, parallel:false})
标签和组织在数据中显示多次,但每个标签和组织只能有一个节点,因此我使用MERGE
为它们创建了唯一的节点。
以下代码的问题在于,它创建重复的AFFILIATED_WITH
关系-实际上创建的AFFILIATED_WITH
关系数量与标签数量相同。
如何更改密码查询,以免创建重复的关系?
答案 0 :(得分:3)
此子句之后:
UNWIND q.tags as tags
您的查询将具有与当前q
的标签数一样多的数据行(每行将具有q, a, id, tags
值)。每个数据行将执行一次后续操作。这就是为什么您创建太多AFFILIATED_WITH
关系的原因。
要解决您的问题,您必须在适当的时候适当减少数据行的数量(这也将加快处理速度,因为可以避免不必要的重复操作)。就您而言,您只需将第二个WITH q, a
子句更改为WITH DISTINCT q, a
:
CALL apoc.periodic.iterate(
"CALL apoc.load.json('file:///test.txt') YIELD value AS q RETURN q",
"CREATE (a:Author {id:q.id, name:q.name, citations:q.n_citation, publications:q.n_pubs})
WITH q, a
UNWIND q.tags as tags
MERGE (t:Tag {name: tags.t})
CREATE (a)-[:HAS_TAGS]->(t)
WITH DISTINCT q, a
WHERE q.org is not null
MERGE (o:Organization {name: q.org})
CREATE (a)-[:AFFILIATED_WITH]->(o)",
{batchSize:10000, iterateList:true, parallel:false}
)
我还通过删除不必要的UNWIND q.id as id
子句简化了查询,并修复了一些语法问题。
[已更新]
如果要添加AUTHORED
关系(按照此答案的注释中的要求),则应在创建AFFILIATED_WITH
关系之前 WHERE q.org is not null
子句将过滤掉一些q
节点。另外,每当您使用CREATE
创建关系时,Cypher都要求您为该关系指定方向。
CALL apoc.periodic.iterate(
"CALL apoc.load.json('file:///test.txt') YIELD value AS q RETURN q",
"CREATE (a:Author {id:q.id, name:q.name, citations:q.n_citation, publications:q.n_pubs})
WITH q, a
UNWIND q.tags as tags
MERGE (t:Tag {name: tags.t})
CREATE (a)-[:HAS_TAGS]->(t)
WITH DISTINCT q, a
UNWIND q.pubs as pubs
MERGE (p:Quanta {id: pubs.i})
CREATE (a)-[r:AUTHORED {rank: pubs.r}]->(p)
WITH q, a
WHERE q.org is not null
MERGE (o:Organization {name: q.org})
CREATE (a)-[:AFFILIATED_WITH]->(o)",
{batchSize:10000, iterateList:true, parallel:false}
)