将CSV关系导入Neo4j

时间:2016-04-03 20:42:35

标签: mysql csv neo4j cypher

我正在尝试将数据从MySQL数据库导入到Neo4j,使用CSV文件作为中介。我正在关注basic example,但无法让它发挥作用。我正在使用这些查询导入两个表:

//Import projects.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/projects.csv" AS row
CREATE (:project
{
     project_id: row.fan,
     project_name: row.project_name
});

//Import people.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/persons.csv" AS row
CREATE (:person
{
     person_id: row.person_id,
     person_name: row.person_name,
});

//Create indicies.
CREATE INDEX ON :project(project_id);
CREATE INDEX ON :project(project_name);
CREATE INDEX ON :person(person_id);
CREATE INDEX ON :person(person_name);

这部分有效。什么不起作用是当我尝试导入关系时:

//Create project-person relationships.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/project_persons.csv" AS row
MATCH (project:project {project_id: row.project_id})
MATCH (person:person {person_id: row.person_id})
MERGE (person)-[:CONTRIBUTED]->(project);

控制台接受查询而没有错误,但永远不会完成。它已经在100%CPU,25%RAM上运行了几天,但磁盘使用率可以忽略不计。数据库信息中没有关系。

我在某个地方犯了错误,还是真的这么慢? project_persons.csv文件的长度为1300万行,但是不应该定期提交会让某些内容显示出来吗?

1 个答案:

答案 0 :(得分:0)

shouldn't the periodic commit make something show up by now?

仅适用于LOAD - 在CREATE前面执行“解释”,它会告诉您它是如何构建更新以及它希望处理的记录数。我遇到了同样的问题 - Neo4j将整个更新作为单个事务进行,并且从未完成。该事务需要分解为50K - 100K tx块以完成所有工作。

执行此操作的一种方法是将关系信息作为一组标记节点导入,然后使用这些节点MATCH()人员和项目节点并根据需要创建关系。

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/project_persons.csv" AS row
CREATE (:Relations {project_id: row.project_id, person_id: row.person_id})

然后以50K批次处理记录:

MATCH (r:Relations) 
MATCH (prj:project {project_id: r.project_id})
MATCH (per:person {person_id: r.person_id})
WITH r, prj, per LIMIT 50000
MERGE (per)-[:CONTRIBUTED]->(prj)
DELETE r

多次运行,直到创建了所有关系并且你很高兴。