如何在重载下提高Neo4j 2.0 cypher / ExecutionResult的性能?

时间:2013-12-23 20:31:28

标签: java multithreading cypher neo4j

背景 随着并发线程数量的增加,我们注意到从ExecutionResult中检索数据时性能下降。我们的生产应用程序有200个工作线程,在嵌入式模式下使用Neo4j 2.0.0社区。 例如以毫秒为单位。

  1. 线程:1,Cypher时间:0,提取时间:188
  2. 线程:10,Cypher时间:1,提取时间:188
  3. 线程:50,Cypher时间:1,提取时间:2481
  4. 线程:100,Cypher时间:1,提取时间:4466
  5. 程序的示例输出(过滤1个线程的结果):

    2013-12-23 14:39:31,137 [main] INFO  net.ahm.graph.CypherLab  - >>>>>>>>>>>>>>>>>>>>>>>>>>>>> NUMBER OF PARALLEL CYPHER EXECUTIONS: 1
    2013-12-23 14:39:31,137 [main] INFO  net.ahm.graph.CypherLab  - >>>> STARTED GRAPHDB
    2013-12-23 14:39:39,203 [main] INFO  net.ahm.graph.CypherLab  - >>>> CREATED NODES
    2013-12-23 14:39:43,510 [main] INFO  net.ahm.graph.CypherLab  - >>>> WARMED UP
    2013-12-23 14:39:43,510 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER TOOK: 0 m-secs
    2013-12-23 14:39:43,698 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> GETTING RESULTS TOOK: 188 m-secs
    2013-12-23 14:39:43,698 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER RETURNED ROWS: 50000
    2013-12-23 14:39:43,698 [Thread-4] INFO  net.ahm.graph.CypherLab  - ### GRAPHDB SHUTDOWNHOOK INVOKED !!!
    
    
    
    2013-12-23 14:40:10,470 [main] INFO  net.ahm.graph.CypherLab  - >>>>>>>>>>>>>>>>>>>>>>>>>>>>> NUMBER OF PARALLEL CYPHER EXECUTIONS: 10
    ...
    2013-12-23 14:40:23,985 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER TOOK: 1 m-secs
    2013-12-23 14:40:25,219 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> GETTING RESULTS TOOK: 188 m-secs
    2013-12-23 14:40:25,219 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER RETURNED ROWS: 50000
    2013-12-23 14:40:25,234 [Thread-4] INFO  net.ahm.graph.CypherLab  - ### GRAPHDB SHUTDOWNHOOK INVOKED !!!
    
    
    2013-12-23 14:41:28,850 [main] INFO  net.ahm.graph.CypherLab  - >>>>>>>>>>>>>>>>>>>>>>>>>>>>> NUMBER OF PARALLEL CYPHER EXECUTIONS: 50
    ...
    2013-12-23 14:41:41,781 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER TOOK: 1 m-secs
    2013-12-23 14:41:45,720 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> GETTING RESULTS TOOK: 2481 m-secs
    2013-12-23 14:41:45,720 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER RETURNED ROWS: 50000
    2013-12-23 14:41:46,855 [Thread-4] INFO  net.ahm.graph.CypherLab  - ### GRAPHDB SHUTDOWNHOOK INVOKED !!!
    
    
    2013-12-23 14:44:09,267 [main] INFO  net.ahm.graph.CypherLab  - >>>>>>>>>>>>>>>>>>>>>>>>>>>>> NUMBER OF PARALLEL CYPHER EXECUTIONS: 100
    ...
    2013-12-23 14:44:22,077 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER TOOK: 1 m-secs
    2013-12-23 14:44:30,915 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> GETTING RESULTS TOOK: 4466 m-secs
    2013-12-23 14:44:30,915 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER RETURNED ROWS: 50000
    2013-12-23 14:44:31,680 [Thread-4] INFO  net.ahm.graph.CypherLab  - ### GRAPHDB SHUTDOWNHOOK INVOKED !!!
    

    测试计划:

    package net.ahm.graph;
    
    import java.io.File;
    import java.util.Map;
    import java.util.concurrent.CountDownLatch;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Executors;
    import java.util.concurrent.TimeUnit;
    
    import org.apache.log4j.Logger;
    import org.neo4j.cypher.javacompat.ExecutionEngine;
    import org.neo4j.cypher.javacompat.ExecutionResult;
    import org.neo4j.graphdb.DynamicLabel;
    import org.neo4j.graphdb.GraphDatabaseService;
    import org.neo4j.graphdb.Node;
    import org.neo4j.graphdb.RelationshipType;
    import org.neo4j.graphdb.Transaction;
    import org.neo4j.graphdb.factory.GraphDatabaseFactory;
    import org.neo4j.graphdb.factory.GraphDatabaseSettings;
    import org.neo4j.graphdb.schema.IndexDefinition;
    import org.neo4j.graphdb.schema.Schema;
    import org.neo4j.kernel.impl.util.FileUtils;
    import org.neo4j.kernel.impl.util.StringLogger;
    
    public class CypherLab {
        private static final Logger LOG = Logger.getLogger(CypherLab.class);
        private final static int CONCURRENCY = 100;
    
        public static void main(String[] args) throws Exception {
            FileUtils.deleteRecursively(new File("graphdb"));
            final GraphDatabaseService graphDb = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder("graphdb")
                    .setConfig(GraphDatabaseSettings.use_memory_mapped_buffers, "true").setConfig(GraphDatabaseSettings.cache_type, "strong")
                    .newGraphDatabase();
            registerShutdownHook(graphDb);
            LOG.info(">>>>>>>>>>>>>>>>>>>>>>>>>>>>> NUMBER OF PARALLEL CYPHER EXECUTIONS: " + CONCURRENCY);
            LOG.info(">>>> STARTED GRAPHDB");
            createIndex("Parent", "name", graphDb);
            createIndex("Child", "name", graphDb);
            try (Transaction tx = graphDb.beginTx()) {
                Node parent = graphDb.createNode(DynamicLabel.label("Parent"));
                parent.setProperty("name", "parent");
                for (int i = 0; i < 50000; i++) {
                    Node child = graphDb.createNode(DynamicLabel.label("Child"));
                    child.setProperty("name", "child" + i);
                    parent.createRelationshipTo(child, RelationshipTypes.PARENT_CHILD);
                }
                tx.success();
            }
            LOG.info(">>>> CREATED NODES");
            final ExecutionEngine engine = new ExecutionEngine(graphDb, StringLogger.SYSTEM);
            for (int i = 0; i < 10; i++) {
                try (Transaction tx = graphDb.beginTx()) {
                    ExecutionResult result = engine.execute("match (n:Parent)-[:PARENT_CHILD]->(m:Child) return n.name, m.name");
                    for (Map<String, Object> row : result) {
                        assert ((String) row.get("n.name") != null);
                        assert ((String) row.get("m.name") != null);
                    }
                    tx.success();
                }
            }
            LOG.info(">>>> WARMED UP");
            ExecutorService es = Executors.newFixedThreadPool(CONCURRENCY);
            final CountDownLatch cdl = new CountDownLatch(CONCURRENCY);
            for (int i = 0; i < CONCURRENCY; i++) {
                es.execute(new Runnable() {
                    @Override
                    public void run() {
                        try (Transaction tx = graphDb.beginTx()) {
                            long time = System.currentTimeMillis();
                            ExecutionResult result = engine.execute("match (n:Parent)-[:PARENT_CHILD]->(m:Child) return n.name, m.name");
                            LOG.info(">>>> CYPHER TOOK: " + (System.currentTimeMillis() - time) + " m-secs");
                            int count = 0;
                            time = System.currentTimeMillis();
                            for (Map<String, Object> row : result) {
                                assert ((String) row.get("n.name") != null);
                                assert ((String) row.get("m.name") != null);
                                count++;
                            }
                            LOG.info(">>>> GETTING RESULTS TOOK: " + (System.currentTimeMillis() - time) + " m-secs");
                            tx.success();
                            LOG.info(">>>> CYPHER RETURNED ROWS: " + count);
                        } catch (Throwable t) {
                            LOG.error(t);
                        } finally {
                            cdl.countDown();
                        }
                    }
                });
            }
            cdl.await();
            es.shutdown();
        }
    
        private static void createIndex(String label, String propertyName, GraphDatabaseService graphDb) {
            IndexDefinition indexDefinition;
            try (Transaction tx = graphDb.beginTx()) {
                Schema schema = graphDb.schema();
                indexDefinition = schema.indexFor(DynamicLabel.label(label)).on(propertyName).create();
                tx.success();
            }
            try (Transaction tx = graphDb.beginTx()) {
                Schema schema = graphDb.schema();
                schema.awaitIndexOnline(indexDefinition, 10, TimeUnit.SECONDS);
                tx.success();
            }
        }
    
        private static void registerShutdownHook(final GraphDatabaseService graphDb) {
            Runtime.getRuntime().addShutdownHook(new Thread() {
                @Override
                public void run() {
                    LOG.info("### GRAPHDB SHUTDOWNHOOK INVOKED !!!");
                    graphDb.shutdown();
                }
            });
        }
    
        private enum RelationshipTypes implements RelationshipType {
            PARENT_CHILD
        }
    }
    

1 个答案:

答案 0 :(得分:1)

commit is merged in时应该更好。哪个将作为2.0.1的一部分发布 还有一些其他较小的阻塞点。

您是否可以尝试将您的网络服务器线程限制为核心次数(或核心数量* 2)?看看这有帮助吗?

我的理解是,在预热并且在缓存中具有热数据集之后,它仅受CPU限制,而不再是I / O绑定的读取。所以你用太多的线程饿死了CPU和工人。

如果我使用8个和100个核心运行测试,我会获得这些分发来执行查询并获取50k结果:

  • 8个主题:50%百分位数500毫秒和90%650毫秒
  • 100个主题:50%百分位数为2600毫秒,90%为6000毫秒

代码和详细的直方图:https://gist.github.com/jexp/a164f6cf9686b8125872