使用DMSDK将数据从Marklogic批量导入RDBMS

时间:2019-05-14 04:41:25

标签: java marklogic marklogic-9

我需要使用DMSDK将大量数据从MarkLogic插入RDBMS

下面是我的代码示例

ArrayList<ArrayList<String>> batch = new ArrayList<ArrayList<String>>();
DatabaseClient client = DatabaseClientFactory.newClient(config.getmlHost(), config.getmlPort(), new DatabaseClientFactory.BasicAuthContext(dbConfig.getuser(), dbConfig.getpassword()));
QueryManager queryMgr = client.newQueryManager();
StructuredQueryBuilder sb = queryMgr.newStructuredQueryBuilder();
StructuredQueryDefinition criteria = sb.and(sb.collection("collection1"),sb.collection("collection2"))
DataMovementManager dmm = client.newDataMovementManager();
QueryBatcher batcher = dmm.newQueryBatcher(criteria)
        .withBatchSize(10)
        .withThreadCount(12)
        .onUrisReady(
                        new ExportListener()
                        .onDocumentReady(doc -> {
                    logger.info("URI received : " + doc.getUri());
                    try {
                        //Getting data From xml and adding it into a arraylist for batch creation
                        ArrayList<String> getDataXml = new GetDataXml().GetDatafromXml(doc.getContent(new DOMHandle()),
                                dbuilder, xPath, ColumnNames);
                        batch.add(getDataXml);

                    } catch (Exception e) {
                        logger.error("Error in the Code", e);
                    }
                })).onQueryFailure(exception -> {
                    logger.error(exception);
                });
        dmm.startJob(batcher);
        batcher.awaitCompletion();
        dmm.stopJob(batcher);
        Class.forName("Driver Name");

        //connecting to RDBMS
        Connection conn = DriverManager.getConnection(DB_URL, USER, PASS)
        PreparedStatement pstmt = conn.prepareStatement("INSERT INTO DBNAME VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)");

        //Creating Batches PreparedStatement.addBatch()
        for(ArrayList<String> eachObject : batch) {             
            createPreparedStatement(pstmt, eachObject).addBatch();
        }

        //
        int[] result = pstmt.executeBatch();
        logger.info("Total Records Inserted " + result.length);
        oracle.closeConnect(oracleConn);

public PreparedStatement createPreparedStatement(PreparedStatement pstmt, ArrayList<String> eachObject)
            throws SQLException {
        for (int i = 0; i < eachObject.size(); i++) {
            pstmt.setString(i + 1, eachObject.get(i));
        }
        return pstmt;
    }

此代码仅从MarkLogic获取数据,并且在完成1批处理后未插入到RDBMS数据库中,我的代码中有任何一点是我的意思。 预先感谢。

1 个答案:

答案 0 :(得分:1)

考虑在开始作业之前以及在onDocumentReady()侦听器中,创建了一条准备好的语句:

  1. 从文档中提取一个或多个值,
  2. 将准备好的语句上的占位符设置为值,并且
  3. 执行准备好的语句。

在阵列中累积所有文档的策略的缺点是,阵列可能会耗尽所有可用内存,并且如果对数据库操作进行交错操作,吞吐量应该会更高。

希望有帮助,