Question

在测试环境中构建3节点集群，并使用Neo4j-JDBC连接将JSON数据保存到Neo4j中。

当仅创建2000个节点和通过JSON统计信息建立2000个关系时：在Neo4j中保存拓扑数据的总时间：456688毫秒，链接大小：2000，节点大小：2000。

保存时不检查节点/关系是否重复（删除了checkVertex和checkRelation方法）：

在Neo4j中保存拓扑数据的总时间：446979毫秒，链接大小：2000，节点大小：4000（由于我们不检查重复性，因此已创建了两个节点）。

代码：

public Connection getConnection(String masterNodeIp, String password) throws Exception {         

 return(Connection)DriverManager.getConnection("jdbc:neo4j:http://"+masterNodeIp+"/?user=neo4j,password="+password+"");

}

//通过遍历边缘，添加了源节点和目标节点。

    try {
    for (Links link : topology.getL2links()) {
      if(conn != null) {
        long srcId = etGraphIdByUniquenessOfOrphan(clientId,link.getSrcMgmtIP());
        GraphId srcGraphId = prepareGraphId(srcId, "DEVICE");
        long tgtId = etGraphIdByUniquenessOfOrphan(clientId,link.getTgtMgmtIP());
        GraphId tgtGraphId = prepareGraphId(tgtId, "DEVICE");
        String srcQuery = createNode(conn, link, false,clientId,discProfileId, 
                          srcGraphId);          
        if(srcQuery!=null && !srcQuery.isEmpty()) 
            stmt.execute(srcQuery);                         
        String tgtQuery = createNode(conn, link, true,clientId,discProfileId, 
                          tgtGraphId);
        if(tgtQuery != null && !tgtQuery.isEmpty()) 
            stmt.execute(tgtQuery);
        String relationQuery = processRelation(conn, link,srcGraphId,tgtGraphId);
        if(relationQuery!=null && !relationQuery.isEmpty())
            stmt.execute(relationQuery);
        }
    }
} catch(Exception e) {
    System.out.println("Exception in processJsonData ::: "+e.getMessage());
    throw e;
} finally {
    stmt.close();
    conn.close();
}

///在创建节点之前，检查节点是否已经存在，以避免重复

private boolean checkVertex(Connection conn, String ip, String hostName, long clientId, long discPId, GraphId graphId) throws Exception{
    Statement stmt = null;
    ResultSet rs = null;
    boolean result=false;
    try {           
        stmt = conn.createStatement();          
        StringBuffer queryBuffer = new StringBuffer();
        queryBuffer.append(" MATCH (node) WHERE node.id ='"+graphId.getId()+"' AND node.sourceType = '"+graphId.getSourceType()+"'");
        queryBuffer.append(" RETURN node");
        rs = (ResultSet) stmt.executeQuery(queryBuffer.toString());
        while(rs.next()) {
            result=true;
            break;
        }
    } catch(Exception e) {
        System.out.println("Exception in fetching node ::: "+e.getMessage());
        throw e;
    } finally {
        rs.close();
        stmt.close();
    }

    return result;
}

///在创建“关系”之前，还检查了关系的重复性。

private boolean checkRelation(Connection conn, Links link, GraphId srcGraphId, GraphId tgtGraphId) throws SQLException {
    Statement stmt = null;
    ResultSet rs = null;
    boolean result=false;
    try {
        stmt = conn.createStatement();          
        StringBuffer queryBuffer = new StringBuffer();
        queryBuffer.append(" MATCH (src:resource)-[r:topology]->(tgt:resource) WHERE src.id='"+srcGraphId.getId()
            +"' AND tgt.id='"+tgtGraphId.getId()+"' AND r.srcInt='"+link.getSrcInt()+"'AND r.tgtInt='"+link.getTgtInt()+"'");
        queryBuffer.append(" RETURN r");
        rs=(ResultSet) stmt.executeQuery(queryBuffer.toString());
        while(rs.next()) {
            result=true;
            break;
        }
    }
    catch(Exception e) {
        System.out.println("Exception in fetching node ::: "+e.getMessage());
    } finally {
        rs.close();
        stmt.close();
    }
    return result;
}

我们为重复性检查查询创建了索引，但是性能仍然很慢。

也请让我们知道如何在Java级别中使用“节点键”唯一约束，以便我们可以跳过一次checkVertex查询。我们试图捕获“ constraintViolationexception”，并添加了日志而不是将其抛出，但是它抛出了异常，没有保存任何节点。

Answer 1

您可以改善很多事情：

对于海量数据导入，直接使用Java驱动程序，JDBC添加了一个间接层
使用参数！
通过UNWIND或通过将多个准备好的statemts作为批处理来使用批处理
不要用文字值构造查询。
确保您具有密钥的索引/约束。 您的查询不使用任何索引，因为您未提供任何标签！
如果您不希望有约束异常，请使用MERGE。
永远不要使用StringBuffer。
使用try-with-resources
使用executeUpdate

用于批处理： https://medium.com/@mesirii/5-tips-tricks-for-fast-batched-updates-of-graph-structures-with-neo4j-and-cypher-73c7f693c8cc

对于参数： http://neo4j-contrib.github.io/neo4j-jdbc/#_minimum_viable_snippet

在neo4j企业版（跟踪版）中保存数据时性能降低

1 个答案: