我正在尝试加载一个大文本文件(在 400-800MB 之间)和我想要将记录插入数据库的文件,但是我遇到了性能问题和内存问题(不是足够的堆空间)。我想知道我目前正在做的是否有更好的方法。
因此我加载的文本文件格式简单,如下所示:
ArrayList<ArrayList<String>> fields = ArrayList<ArrayList<String>>();
ArrayList<String> data= new ArrayList<String>();
while ((line = br.readLine()) != null) {
if(line.length() >= 6)
data.add(line.substring(0, 6));
if(line.length() >= 24)
data.add(line.substring(6, 15));
if(line.length() >= 30)
data.add(line.substring(15, 20));
if(line.length() >= 48)
data.add(line.substring(20, 25));
...
fields.add(data); //it looks like [[00000, Andy , 8920,..],[00001, Roger, ...]]
} //end read
System.gc();
db.insertValues(input);
当前方法:读取每一行,获取字段,然后构建查询
public void insertValues(ArrayList<ArrayList<String>> data) {
PreparedStatement ps = null;
Connection con = null;
try {
con = getConnection();
ps = con.prepareStatement("Insert into CUST_ACCT "
+ "(CID,NAME,R_NUM,CKM_IND,DATE_1,DATE_2,DATE_3,DATE_4,DATE_5,DATE_6,DATE_7,DATE_8,DATE_9,DATE_10,NUMBER_1,NUMBER_2,NUMBER_3,NUMBER_4,NUMBER_5,NUMBER_6,NUMBER_7,NUMBER_8,NUMBER_9,NUMBER_10,STRING_1,STRING_2,STRING_3,STRING_4,STRING_5,STRING_6,STRING_7,STRING_8,STRING_9,STRING_10,GUID,PARN_GUID,LAST_UPDT_DATE_TIME_STAMP)"
+ " values "
+ "(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,sysdate)");
for(int i=0; i< data.size(); i++) {
ps.setString(1, data.get(i).get(0)); //0
ps.setString(2, data.get(i).get(1)); //1
ps.setString(3, data.get(i).get(2)); //2
ps.setString(4, data.get(i).get(3)); //3
...
ps.addBatch();
}
int[] i = ps.executeBatch();
log.info("total of record inserted: "+i.length);
}
数据库代码
Not enough heap space
但是我收到_e_rror并出现@ApplicationScope
public class DatabaseOpenHelper extends DaoMaster.OpenHelper {
private String TAG = DatabaseOpenHelper.class.getSimpleName();
private static final String SP_KEY_DB_VER = "db_ver";
private static final int DATABASE_VERSION = 2;
private Context context;
private SharedPreferences sharedPreferences;
private SQLiteDatabase sqliteDatabase;
private static String DB_PATH;
private static String DB_NAME;
public DatabaseOpenHelper(Context context, SharedPreferences sharedPreferences, String name, SQLiteDatabase.CursorFactory factory) {
super(context, name, factory);
this.context = context;
this.sharedPreferences = sharedPreferences;
DB_NAME = name;
try {
createDataBase();
} catch (Exception ioe) {
throw new Error("Unable to create database");
}
}
@Override
public void onUpgrade(SQLiteDatabase db, int oldVersion, int newVersion) {
// TODO Auto-generated method stub
}
/** Open Database for Use */
public void openDatabase() {
String databasePath = context.getDatabasePath(DB_NAME).toString();
sqliteDatabase = SQLiteDatabase.openDatabase(databasePath, null,
(SQLiteDatabase.OPEN_READWRITE));
}
/** Close Database after use */
@Override
public synchronized void close() {
if ((sqliteDatabase != null) && sqliteDatabase.isOpen()) {
sqliteDatabase.close();
}
super.close();
}
/** Get database instance for use */
public SQLiteDatabase getSqliteDatabase() {
return sqliteDatabase;
}
/** Create new database if not present */
public void createDataBase() {
SQLiteDatabase sqliteDatabase;
if (databaseExists()) {
int dbVersion = sharedPreferences.getInt(SP_KEY_DB_VER, 1);
/* If different version then delete current database and copy the new one from assets*/
if (DATABASE_VERSION != dbVersion) {
File dbFile = context.getDatabasePath(DB_NAME);
boolean dbFileDeleted = dbFile.delete();
if (!dbFileDeleted) {
Log.w(TAG, "Unable to update database");
} else {
createDataBase();
}
}
} else {
/* Database does not exists create blank database */
sqliteDatabase = this.getReadableDatabase();
sqliteDatabase.close();
SharedPreferences.Editor editor = sharedPreferences.edit();
editor.putInt(SP_KEY_DB_VER, DATABASE_VERSION);
editor.apply();
copyDataBase();
}
}
/** Check Database if it exists */
private boolean databaseExists() {
SQLiteDatabase sqliteDatabase = null;
try {
String databasePath = context.getDatabasePath(DB_NAME).toString();
sqliteDatabase = SQLiteDatabase.openDatabase(databasePath, null,
SQLiteDatabase.OPEN_READONLY);
} catch (SQLiteException e) {
e.printStackTrace();
}
if (sqliteDatabase != null) {
sqliteDatabase.close();
}
return sqliteDatabase != null;
}
/**
* Copy existing database file in system
*/
public void copyDataBase() {
int length;
byte[] buffer = new byte[1024];
String databasePath = context.getDatabasePath(DB_NAME).toString();
try {
InputStream databaseInputFile = this.context.getAssets().open(DB_NAME);
OutputStream databaseOutputFile = new FileOutputStream(databasePath);
while ((length = databaseInputFile.read(buffer)) > 0) {
databaseOutputFile.write(buffer, 0, length);
databaseOutputFile.flush();
}
databaseInputFile.close();
databaseOutputFile.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
错误,我也尝试构建查询但是它会逐个插入记录,一小时后它只会插入 20k 记录数百万。有没有更好的方法来加载数据?
答案 0 :(得分:1)
您将所有文件加载到内存中然后尝试逐行读取所有文件,这会导致性能和内存问题(堆空间等等)
您可以使用Scanner
读取文件,这样就可以逐行读取而不加载到内存中。
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// db insert!
}
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
否则使用Apache Commons IO
LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
try {
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
// db insert
}
} finally {
LineIterator.closeQuietly(it);
}
对于enanched performance我建议你只打开一次连接
// your logic....
Connection con = getConnection();
// reading file logic
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
insertValues(con, line);
// other logic
}
// checking exception etc
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
if (con != null ) {
con.close();
}
}
总结:
希望你理解......这些都是简单的例子,你需要根据你的需要来改变它们!
答案 1 :(得分:1)
不要读取整个文件 - 读取1000行,然后使用预准备语句插入它们并在此之后提交事务。然后读另一个1000,......
另外我认为Oracle有一个加载数据的特殊工具(google SQL * Loader和Data pump)。
答案 2 :(得分:-1)
让我看看我是否正确理解了您的需求:
你有一个大文件和文件中的每一行,你需要在数据库中插入更多的表。我理解正确吗?
如果是,您是否尝试使用Oracle的"SQL*Loader"工具? 我没有为这么大的文件测试它,但它可能是一个解决方案。 你可以从你的Java应用程序中调用它。