在Neo4j中的多个属性上创建多个索引

时间:2016-08-31 13:34:12

标签: indexing neo4j

我必须为大约15种不同的标签和400种不同的属性创建索引。目前我有一个类似的bash脚本:

echo "CYPHER CREATE INDEX ON :Person(uuid);" | $NEO4J_HOME/bin/neo4j-shell
sleep 5
echo "CYPHER CREATE INDEX ON :Person(name);" | $NEO4J_HOME/bin/neo4j-shell
sleep 5
echo "CYPHER CREATE INDEX ON :Person(surname);" | $NEO4J_HOME/bin/neo4j-shell
sleep 5
echo "CYPHER CREATE INDEX ON :Animal(uuid);" | $NEO4J_HOME/bin/neo4j-shell
sleep 5
echo "CYPHER CREATE INDEX ON :Animal(name);" | $NEO4J_HOME/bin/neo4j-shell
sleep 5

问题是,过了一会儿我看到了这个警告:

2016-08-31 13:18:16.448+0000 WARN  [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 432ms.

最终会导致内存不足异常:

2016-08-31 11:28:54.579+0000 ERROR [o.n.k.i.a.i.IndexPopulationJob] Failed to populate index: [:GeneticVariant(mapped_gene) [provider: {key=lucene, version=1.0}
]] Java heap space
java.lang.OutOfMemoryError: Java heap space
        at org.neo4j.kernel.impl.locking.AbstractLockService.acquireNodeLock(AbstractLockService.java:52)
        at org.neo4j.kernel.impl.locking.ReentrantLockService.acquireNodeLock(ReentrantLockService.java:35)
        at org.neo4j.kernel.impl.transaction.state.NeoStoreIndexStoreView$NodeStoreScan.run(NeoStoreIndexStoreView.java:212)
        at org.neo4j.kernel.impl.api.index.BatchingMultipleIndexPopulator$BatchingStoreScan.run(BatchingMultipleIndexPopulator.java:384)
        at org.neo4j.kernel.impl.api.index.IndexPopulationJob.indexAllNodes(IndexPopulationJob.java:138)
        at org.neo4j.kernel.impl.api.index.IndexPopulationJob.run(IndexPopulationJob.java:110)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
        at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:104)
2016-08-31 11:30:52.653+0000 INFO  [o.n.k.i.a.i.IndexPopulationJob] Forcefully shutting down executor.
BatchingMultipleIndexPopulator{activeTasks=0, executor=java.util.concurrent.ThreadPoolExecutor@354e43e[Running, pool size = 3, active threads = 0, queued tasks = 0, completed tasks = 21], batchedUpdates = [org.neo4j.kernel.impl.api.index.BatchingMultipleIndexPopulator$BatchingIndexPopulation@32c99a02 - 4589 updates], queuedUpdates = 0}

我尝试通过在每个命令后添加sleep来解决这个问题,但这还不够。是否有一种更聪明的方法可以使这项工作?

我在Mac上使用Neo4j服务器3.0.0。

1 个答案:

答案 0 :(得分:0)

如果您知道数据库中所需的所有索引和唯一约束,则可以使用APOC plugin过程apoc.schema.assert来确保它们都存在。将创建尚未存在的任何索引/约束,并且注意:您的呼叫中未指定的任何现有索引/约束将被删除

例如,如果您想断言这些(并且只有这些)索引存在:

:Person(name)
:Person(surname)
:Animal(name)

并且存在这些(并且仅存在这些)唯一约束:

:Person(uuid)
:Animal(uuid)

您可以进行此Cypher查询:

CALL apoc.schema.assert(
  {Person:['name', 'surname'], Animal:['name']},
  {Person:['uuid'], Animal:['uuid']})
YIELD label, key, unique, action
RETURN *;

如果数据库之前没有索引或约束,结果将如下所示:

╒═══════╤═══════╤══════╤══════╕
│action │key    │label │unique│
╞═══════╪═══════╪══════╪══════╡
│CREATED│name   │Animal│false │
├───────┼───────┼──────┼──────┤
│CREATED│name   │Person│false │
├───────┼───────┼──────┼──────┤
│CREATED│surname│Person│false │
├───────┼───────┼──────┼──────┤
│CREATED│uuid   │Animal│true  │
├───────┼───────┼──────┼──────┤
│CREATED│uuid   │Person│true  │
└───────┴───────┴──────┴──────┘