考虑使用以下方法从一些数据结构(InteractionNetwork
)读取数据,并使用SQLite-JDBC dirver将它们写入SQLite数据库中的表:
private void loadAnnotations(InteractionNetwork network) throws SQLException {
PreparedStatement insertAnnotationsQuery =
connection.prepareStatement(
"INSERT INTO Annotations(GOId, ProteinId, OnthologyId) VALUES(?, ?, ?)");
PreparedStatement getProteinIdQuery =
connection.prepareStatement(
"SELECT Id FROM Proteins WHERE PrimaryUniProtKBAccessionNumber = ?");
connection.setAutoCommit(false);
for(common.Protein protein : network.get_protein_vector()) {
/* Get ProteinId for the current protein from another table and
insert the value into the prepared statement. */
getProteinIdQuery.setString(1, protein.get_primary_id());
ResultSet result = getProteinIdQuery.executeQuery();
result.next();
insertAnnotationsQuery.setLong(2, result.getLong(1));
/* Extract all the other data and add all the tuples to the batch. */
}
insertAnnotationsQuery.executeBatch();
connection.commit();
connection.setAutoCommit(true);
}
此代码工作正常,程序运行大约30秒,平均占用80毫米空间。因为代码看起来很丑,我想重构它。我做的第一件事就是将getProteinIdQuery
的声明移到循环中:
private void loadAnnotations(InteractionNetwork network) throws SQLException {
PreparedStatement insertAnnotationsQuery =
connection.prepareStatement(
"INSERT INTO Annotations(GOId, ProteinId, OnthologyId) VALUES(?, ?, ?)");
connection.setAutoCommit(false);
for(common.Protein protein : network.get_protein_vector()) {
/* Get ProteinId for the current protein from another table and
insert the value into the prepared statement. */
PreparedStatement getProteinIdQuery = // <--- moved declaration of statement here
connection.prepareStatement(
"SELECT Id FROM Proteins WHERE PrimaryUniProtKBAccessionNumber = ?");
getProteinIdQuery.setString(1, protein.get_primary_id());
ResultSet result = getProteinIdQuery.executeQuery();
result.next();
insertAnnotationsQuery.setLong(2, result.getLong(1));
/* Extract all the other data and add all the tuples to the batch. */
}
insertAnnotationsQuery.executeBatch();
connection.commit();
connection.setAutoCommit(true);
}
现在运行代码时会发生什么,它需要大约130米的堆空间并且需要永久运行。任何人都可以解释这种奇怪的行为吗?
答案 0 :(得分:2)
如果第一个片段看起来很难看,我想这是一个品味问题; - )......
然而,第二个代码片段需要更长时间(IMHO)的原因是,现在for循环的每次迭代都会创建一个新的PreparedStatement实例(getProteinIdQuery),而在第一个片段中,您重用了预准备语句,以它的方式使用它:实例化,然后提供适当的值。
至少,这是我的意见...... 扬
答案 1 :(得分:2)
准备一份陈述需要时间,正如您所发现的那样。无论代码是否丑陋,速度的降低也非常难看,因此您需要使用更快的形式。
但你可以做的是使用内部类来保存细节并提供更好的界面:
private class DatabaseInterface {
private PreparedStatement insertAnnotation, getProteinId;
public DatabaseInterface() {
// This is an inner class; 'connection' is variable in outer class
insertAnnotation = connection.prepareStatement(
"INSERT INTO Annotations(GOId, ProteinId, OnthologyId) VALUES(?, ?, ?)");
getProteinId = connection.prepareStatement(
"SELECT Id FROM Proteins WHERE PrimaryUniProtKBAccessionNumber = ?");
}
public long getId(Protein protein) { // Exceptions omitted...
getProteinId.setString(1, protein.get_primary_id());
ResultSet result = getProteinId.executeQuery();
try {
result.next();
return result.getLong(1);
} finally {
result.close();
}
}
public void insertAnnotation(int GOId, long proteinId, String ontologyId) {
insertAnnotation.setInt(1, GOId); // type may be wrong
insertAnnotation.setLong(2, proteinId);
insertAnnotation.setString(3, ontologyId); // type may be wrong
insertAnnotation.executeUpdate();
}
}
private void loadAnnotations(InteractionNetwork network) throws SQLException {
connection.setAutoCommit(false);
DatabaseInterface dbi = new DatabaseInterface();
for(common.Protein protein : network.get_protein_vector()) {
dbi.insertAnnotation(..., dbi.getId(protein), ...);
}
connection.commit();
connection.setAutoCommit(true);
}
目标是你有一段代码知道将事情变成SQL(如果你去另一个数据库那么很容易适应)和另一段知道如何协调这些东西的代码。