我有一个查询,该查询从csv文件读取一组ID,在数据库中搜索那些节点,并将结果写入csv。我试图使此查询尽快运行,并且想知道是否可以使用apoc.periodic.iterate
并行执行读取操作:
http://neo4j-contrib.github.io/neo4j-apoc-procedures/3.5/cypher-execution/commit-batching/
我编写了一个查询,该查询可以满足我的需求,但实际上我只是想了解如何尽快运行此查询。
这是查询的当前版本:
CALL apoc.export.csv.query('CALL apoc.load.csv(\'file:///edge.csv\') YIELD map as edge
MATCH (n:paper)
WHERE n.paper_id = edge.`From` OR n.paper_id = edge.`To`
RETURN n.paper_title',
'node.csv', {});
此查询将创建所需的结果node.csv
文件,但是随着edge.csv
大小的增加,该操作可能会大大降低速度。
我希望做的是这样的:
CALL apoc.periodic.iterate(
'LOAD CSV WITH HEADERS FROM \'file:///edge.csv\' as row RETURN row',
'CALL apoc.export.csv.query(\'MATCH (n:paper) WHERE n.paper_id = row.`From` OR n.paper_id = row.`To` RETURN DISTINCT(n.paper_id) AS paper_id\', \'nodePar.csv\', {})'
, {batchSize:10, iterateList:true, parallel:true, failedParams:0})
;
此查询将运行,但除以下消息外不产生任何输出:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| batches | total | timeTaken | committedOperations | failedOperations | failedBatches | retries | errorMessages | batch | operations | wasTerminated | failedParams |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 14463 | 144629 | 0 | 144629 | 0 | 0 | 0 | {} | {total: 14463, committed: 14463, failed: 0, errors: {}} | {total: 144629, committed: 144629, failed: 0, errors: {}} | FALSE | {} |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
我的主要问题是:可以以这种方式使用apoc.periodic.iterate来加速此查询吗?
然后,随着edge.csv
文件大小的增加,还有其他方法可以加快查询速度吗?