Question

我正在构建一个应用程序，用户可以在其中管理词典。一个功能是上传文件以初始化或更新字典的内容。

我开始关注的结构部分是Dictionary -[:CONTAINS]->Word。从空数据库（Neo4j 1.9.4，但也试过2.0.0M5）开始，在分布式环境中通过Spring Data Neo4j 2.3.1访问（因此使用SpringRestGraphDatabase，但使用localhost进行测试），我正在尝试加载7k字在1字典中。但是我不能在不到8/9分钟的时间内通过核心i7,8Gb RAM和SSD驱动器（ulimit提升到40000）完成它。

我已经阅读了很多关于使用REST加载/插入性能的帖子，我试图应用我发现但没有更好运气的建议。由于我的应用程序限制，BatchInserter工具对我来说似乎不是一个好选择。

我希望能在几秒钟而不是几分钟内加载10k节点吗？

在我读完所有内容之后，这是我提出的代码：

Map<String, Object> dicProps = new HashMap<String, Object>();
dicProps.put("locale", locale);
dicProps.put("category", category);
Dictionary dictionary = template.createNodeAs(Dictionary.class, dicProps);
Map<String, Object> wordProps = new HashMap<String, Object>();
Set<Word> words = readFile(filename); 
for (Word gw : words) {
  wordProps.put("txt", gw.getTxt());
  Word w = template.createNodeAs(Word.class, wordProps);
  template.createRelationshipBetween(dictionary, w, Contains.class, "CONTAINS", true);
}

Answer 1

我通过创建一些CSV文件来解决这个问题，然后从Neo4j中读取它。需要做出这样的步骤：

写一些获取输入数据的类，并以它为基础创建CSV文件（每个节点类型可以是一个文件，甚至可以创建用于构建关系的文件）。
在我的情况下，我还创建了一个servlet，允许Neo4j通过HTTP读取该文件。

创建适当的Cypher语句，允许读取和解析该CSV文件。我使用了一些样本（如果你使用Spring Data也记得标签）：

简单的一个：

load csv with headers from {fileUrl} as line 
   merge (:UserProfile:_UserProfile {email: line.email})

更复杂：

load csv with headers from {fileUrl} as line 
     match (c:Calendar {calendarId: line.calendarId})
     merge (a:Activity:_Activity {eventId: line.eventId})
on create set  a.eventSummary = line.eventSummary,
     a.eventDescription = line.eventDescription,
     a.eventStartDateTime = toInt(line.eventStartDateTime),
     a.eventEndDateTime = toInt(line.eventEndDateTime),
     a.eventCreated = toInt(line.eventCreated), 
     a.recurringId = line.recurringId
merge (a)-[r:EXPORTED_FROM]->c
return count(r)

Answer 2

尝试以下

在执行批量操作时，使用本机Neo4j API而不是spring-data-neo4j。
分批提交，即可能每500字

注意：SDN添加了某些属性（类型），使用本机方法时会丢失这些属性。

Neo4j：插入7k节点很慢（Spring Data Neo4j / SpringRestGraphDatabase）

2 个答案: