我正在尝试使用周期性迭代APOC在图形数据模型中执行更改。我使用LOAD CSV
命令来解析文本数据并在neo4j 3.4.4中上传文章。
USING PERIODIC COMMIT 5000 LOAD CSV WITH HEADERS
FROM 'file:///article.txt' as r FIELDTERMINATOR '\t'
MATCH (a:Article {PMID: toInt(r.PMID)})
WITH a, toLower(r.ArticleTitle) as text
WITH a, reduce(t=text, delim in [",",".","!","?",'"',":",";","'","(",")","[","]","{","}"] | replace(t,delim," ")) as text
WITH a, reduce(t=text, delim in ["/", "\\"] | replace(t, delim, " ")) as text with a, filter(w in split(text, " ") where length(w) > 2) as words SET a.words = words;
我可以使用以下命令创建Word节点,该命令对要加载的数据量非常敏感。当前数据库有83,000条文章,查询在几分钟内运行良好。
MATCH (a:Article) where exists(a.words)
WITH a
FOREACH (word in a.words|
MERGE (w:Word {Name: word})
MERGE (a) -[r:contains]-> (w)
ON CREATE SET r.f = 1
ON MATCH SET r.f = r.f + 1
)
因此,我尝试对较小批量的数据使用apoc.periodic.iterate
过程。由于APOC过程不允许查询中的引号,因此我首先创建一个过滤后的单词数组,以便将其用于生成节点和关系。
CALL apoc.periodic.iterate('MATCH (a:Article) WHERE EXISTS(a.words) RETURN a as art','WITH {art} as a FOREACH (word in a.words | MERGE (w:Word {Name: word}) MERGE (a) -[r:contains]-> (w) ON CREATE SET r.f = 1 ON MATCH SET r.f = r.f + 1)', {batchSize:1000, parallel:true})
上述查询在某些批次上失败,并显示以下消息:
batches total timeTaken committedOperations failedOperations failedBatches retries errorMessages batch operations wasTerminated
83 1233 2 573 82 82 0
{
}
{
"total": 83,
"committed": 1,
"failed": 82,
"errors": {
"java.lang.NullPointerException": 82
}
}
{
"total": 1233,
"committed": 573,
"failed": 82,
"errors": {
}
}
false
数据是公开的,但由于太大而无法在此处共享,因此我无法弄清楚哪个条目失败,此外,所有条目都适用于纯CYPHER
查询。