我需要使用DMSDK将大量数据从MarkLogic插入RDBMS
下面是我的代码示例
ArrayList<ArrayList<String>> batch = new ArrayList<ArrayList<String>>();
DatabaseClient client = DatabaseClientFactory.newClient(config.getmlHost(), config.getmlPort(), new DatabaseClientFactory.BasicAuthContext(dbConfig.getuser(), dbConfig.getpassword()));
QueryManager queryMgr = client.newQueryManager();
StructuredQueryBuilder sb = queryMgr.newStructuredQueryBuilder();
StructuredQueryDefinition criteria = sb.and(sb.collection("collection1"),sb.collection("collection2"))
DataMovementManager dmm = client.newDataMovementManager();
QueryBatcher batcher = dmm.newQueryBatcher(criteria)
.withBatchSize(10)
.withThreadCount(12)
.onUrisReady(
new ExportListener()
.onDocumentReady(doc -> {
logger.info("URI received : " + doc.getUri());
try {
//Getting data From xml and adding it into a arraylist for batch creation
ArrayList<String> getDataXml = new GetDataXml().GetDatafromXml(doc.getContent(new DOMHandle()),
dbuilder, xPath, ColumnNames);
batch.add(getDataXml);
} catch (Exception e) {
logger.error("Error in the Code", e);
}
})).onQueryFailure(exception -> {
logger.error(exception);
});
dmm.startJob(batcher);
batcher.awaitCompletion();
dmm.stopJob(batcher);
Class.forName("Driver Name");
//connecting to RDBMS
Connection conn = DriverManager.getConnection(DB_URL, USER, PASS)
PreparedStatement pstmt = conn.prepareStatement("INSERT INTO DBNAME VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)");
//Creating Batches PreparedStatement.addBatch()
for(ArrayList<String> eachObject : batch) {
createPreparedStatement(pstmt, eachObject).addBatch();
}
//
int[] result = pstmt.executeBatch();
logger.info("Total Records Inserted " + result.length);
oracle.closeConnect(oracleConn);
public PreparedStatement createPreparedStatement(PreparedStatement pstmt, ArrayList<String> eachObject)
throws SQLException {
for (int i = 0; i < eachObject.size(); i++) {
pstmt.setString(i + 1, eachObject.get(i));
}
return pstmt;
}
此代码仅从MarkLogic获取数据,并且在完成1批处理后未插入到RDBMS数据库中,我的代码中有任何一点是我的意思。 预先感谢。
答案 0 :(得分:1)
考虑在开始作业之前以及在onDocumentReady()侦听器中,创建了一条准备好的语句:
在阵列中累积所有文档的策略的缺点是,阵列可能会耗尽所有可用内存,并且如果对数据库操作进行交错操作,吞吐量应该会更高。
希望有帮助,