Neo4j服务器从CSV导入70K行时崩溃。无法优化查询

时间:2020-05-16 11:56:53

标签: neo4j graphql neo4j-apoc

我对neo4j还是很陌生,需要一些帮助来优化导入查询。当前,数据库连接崩溃或永久运行。

我大约有7万行,基本上是25K个唯一节点。我正在使用以下脚本导入数据并创建关系。 csv文件本身位于60MB

附近

创建约束/索引

CREATE CONSTRAINT ON (p:Person) ASSERT p.id IS UNIQUE
CREATE INDEX ON :Session(id)
CREATE INDEX ON :ReferrerType(id)
CREATE INDEX ON :Referrer(id)
CREATE INDEX ON :Landing(id)
CREATE INDEX ON :eventType(id)
CREATE INDEX ON :conversion(id)

导入数据

USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM 'file:///df_all.csv' AS row
MERGE (p:Person {id: row._id})
MERGE (s:Session {id: row.sessionField})
MERGE (rt:ReferrerType {id: row.referer_type})
MERGE (rf:Referrer {id: row.refererField})
MERGE (l:Landing {id: row.location, device: row.browserName, firstTime: row.first})
MERGE (et:eventType {id: row.eventType, links: row.linkClick, scroll: row.scrollPage, button: row.buttonClick, search: row.visitorSearch, forms: row.formSubmission})
MERGE (cf:conversion {id: row.visitorEmail})
FOREACH(
  x IN CASE WHEN s.referer_type IS NULL OR NOT row.referer_type IN s.referer_type THEN [1] END | SET s.referer_type = COALESCE(s.referer_type, []) + row.referer_type
)
FOREACH(
  x IN CASE WHEN rf.location IS NULL OR NOT row.location IN rf.location THEN [1] END | SET rf.location = COALESCE(rf.location, []) + row.location
  )
FOREACH(
  x IN CASE WHEN l.eventType IS NULL OR NOT row.eventType IN l.eventType THEN [1] END | SET l.eventType = COALESCE(l.eventType, []) + row.eventType
  )
FOREACH(
  x IN CASE WHEN l.visitorEmail IS NULL OR NOT row.visitorEmail IN l.visitorEmail THEN [1] END | SET l.visitorEmail = COALESCE(l.visitorEmail, []) + row.visitorEmail
  )
MERGE (p)-[:HAS_SESSION]->(s)-[:REF_TYPE]->(rt)-[:FROM]->(rf)-[:LANDS_ON]->(l)-[:EVENT_DONE]->(et)<-[:VISITOR_TYPE]-(cf)
RETURN p, s, rt, rf, l, et, cf ORDER BY row._id,row.sessionField,row.timestamp

我不确定如何进一步优化查询。我将GCP VM与8 cores and 30 GB Memory一起使用。大多数内核处于空闲状态,并且看不到超过1-10GB的内存消耗

Get this warning usually: The execution plan for this query contains the Eager operator

0 个答案:

没有答案