我正在尝试构建一个Java应用程序,该程序可以将非常大的任意SQL SELECT查询结果集流式传输到JSONL文件中,特别是通过SQLServer,但是希望与任何JDBC DataSource
一起运行。在Python中,仅将sql客户端结果视为生成器,然后调用json.dumps()
会很容易。但是,在这段代码中,似乎在将所有内容都写出之前将其放入内存中,通常会导致堆和垃圾回收异常。我需要运行它的查询非常大,可以带回多达10GB的原始数据。执行时间不是主要问题,只要它每次都起作用即可。
我尝试过在每行之后调用flush(这是荒谬的),这似乎对小数据集有帮助,但对大数据集却没有帮助。谁能建议我可以用来轻松实现这一目标的策略?
在我的SQL客户端类中,我使用Apache DbUtils QueryRunner
和MapListHandler
创建Map
列表,这是我需要的灵活性(与Java中传统的方法不同,后者需要指定模式和类型):
public List<Map<String, Object>> query(String queryText) {
try {
DbUtils.loadDriver("com.microsoft.sqlserver.jdbc.Driver");
// this function just sets up all the connection properties. Ommitted for clarity
DataSource ds = this.initDataSource();
StatementConfiguration sc = new StatementConfiguration.Builder().fetchSize(10000).build();
QueryRunner queryRunner = new QueryRunner(ds, sc);
MapListHandler handler = new MapListHandler();
return queryRunner.query(queryText, handler);
} catch (Exception e) {
logger.error(e.getMessage());
e.printStackTrace();
return null;
}
}
JsonLOutputWriter
类:
JsonLOutputWriter(String filename) {
GsonBuilder gsonBuilder = new GsonBuilder();
gsonBuilder.serializeNulls();
this.gson = gsonBuilder.create();
try {
this.writer = new PrintWriter(new File(filename), ENCODING);
} catch (FileNotFoundException | UnsupportedEncodingException e) {
e.printStackTrace();
}
}
void writeRow(Map row) {
this.writer.println(this.gson.toJson(row));
}
void flush() {
this.writer.flush();
}
主要方法:
JsonLOutputWriter writer = new JsonLOutputWriter(outputFile)
for (Map row : client.query(inputSql)) {
writer.writeRow(row);
}
writer.flush()
答案 0 :(得分:1)
基本上,DbUtils
不能直接使用。由于处理程序创建了QueryRunner
,因此我摆脱了MapListHandler
和ArrayList
。我不是基于拉,而是基于推,而是创建了一个非常相似的MyQueryRunner
,它使用了MyRowHandler
,而不是返回集合,而是迭代了ResultSet
并调用了我的输出函数。 / p>
我敢肯定,有更优雅的方法可以执行此操作并返回某种行缓冲区,但这是我需要的80/20并适用于大型数据集。
RowHandler
public class RowHandler {
private static final RowProcessor ROW_PROCESSOR = new BasicRowProcessor();
private JsonLOutputWriter writer;
public RowHandler(JsonLOutputWriter writer) {
this.writer = writer;
}
int handle(ResultSet rs) throws SQLException {
AtomicInteger counter = new AtomicInteger();
while (rs.next()) {
writer.writeRow(this.handleRow(rs));
counter.getAndIncrement();
}
return counter.intValue();
}
protected Map<String, Object> handleRow(ResultSet rs) throws SQLException {
return this.ROW_PROCESSOR.toMap(rs);
}
}
QueryHandler
class CustomQueryRunner extends AbstractQueryRunner {
private final RowHandler rh;
CustomQueryRunner(DataSource ds, StatementConfiguration stmtConfig, RowHandler rh) {
super(ds, stmtConfig);
this.rh = rh;
}
int query(String sql) throws SQLException {
Connection conn = this.prepareConnection();
return this.query(conn, true, sql);
}
private int query(Connection conn, boolean closeConn, String sql, Object... params)
throws SQLException {
if (conn == null) {
throw new SQLException("Null connection");
}
PreparedStatement stmt = null;
ResultSet rs = null;
int count = 0;
try {
stmt = this.prepareStatement(conn, sql);
this.fillStatement(stmt, params);
rs = this.wrap(stmt.executeQuery());
count = rh.handle(rs);
} catch (SQLException e) {
this.rethrow(e, sql, params);
} finally {
try {
close(rs);
} finally {
close(stmt);
if (closeConn) {
close(conn);
}
}
}
return count;
}
}