构建大型文件的动态查询

时间:2017-10-30 14:26:44

标签: java sql oracle performance

我正在尝试加载一个大文本文件(在 400-800MB 之间)和我想要将记录插入数据库的文件,但是我遇到了性能问题和内存问题(不是足够的堆空间)。我想知道我目前正在做的是否有更好的方法。

因此我加载的文本文件格式简单,如下所示:

ArrayList<ArrayList<String>> fields = ArrayList<ArrayList<String>>();
ArrayList<String> data= new ArrayList<String>();
while ((line = br.readLine()) != null) {
    if(line.length() >= 6)
        data.add(line.substring(0, 6)); 
    if(line.length() >= 24)
        data.add(line.substring(6, 15));  
    if(line.length() >= 30)
        data.add(line.substring(15, 20)); 
    if(line.length() >= 48)
        data.add(line.substring(20, 25));
...
    fields.add(data); //it looks like [[00000, Andy   , 8920,..],[00001, Roger, ...]]
} //end read
System.gc();
db.insertValues(input);

当前方法:读取每一行,获取字段,然后构建查询

public void insertValues(ArrayList<ArrayList<String>> data) {
        PreparedStatement ps = null;
        Connection con = null;
        try {
            con = getConnection();
            ps = con.prepareStatement("Insert into CUST_ACCT "
                    + "(CID,NAME,R_NUM,CKM_IND,DATE_1,DATE_2,DATE_3,DATE_4,DATE_5,DATE_6,DATE_7,DATE_8,DATE_9,DATE_10,NUMBER_1,NUMBER_2,NUMBER_3,NUMBER_4,NUMBER_5,NUMBER_6,NUMBER_7,NUMBER_8,NUMBER_9,NUMBER_10,STRING_1,STRING_2,STRING_3,STRING_4,STRING_5,STRING_6,STRING_7,STRING_8,STRING_9,STRING_10,GUID,PARN_GUID,LAST_UPDT_DATE_TIME_STAMP)"
                    + " values "
                    + "(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,sysdate)");
for(int i=0; i< data.size(); i++) {
                ps.setString(1, data.get(i).get(0)); //0
                ps.setString(2, data.get(i).get(1)); //1
                ps.setString(3, data.get(i).get(2)); //2
                ps.setString(4, data.get(i).get(3)); //3
        ...
        ps.addBatch();
        }
        int[] i = ps.executeBatch();
        log.info("total of record inserted: "+i.length);
    }

数据库代码

Not enough heap space

但是我收到_e_rror并出现@ApplicationScope public class DatabaseOpenHelper extends DaoMaster.OpenHelper { private String TAG = DatabaseOpenHelper.class.getSimpleName(); private static final String SP_KEY_DB_VER = "db_ver"; private static final int DATABASE_VERSION = 2; private Context context; private SharedPreferences sharedPreferences; private SQLiteDatabase sqliteDatabase; private static String DB_PATH; private static String DB_NAME; public DatabaseOpenHelper(Context context, SharedPreferences sharedPreferences, String name, SQLiteDatabase.CursorFactory factory) { super(context, name, factory); this.context = context; this.sharedPreferences = sharedPreferences; DB_NAME = name; try { createDataBase(); } catch (Exception ioe) { throw new Error("Unable to create database"); } } @Override public void onUpgrade(SQLiteDatabase db, int oldVersion, int newVersion) { // TODO Auto-generated method stub } /** Open Database for Use */ public void openDatabase() { String databasePath = context.getDatabasePath(DB_NAME).toString(); sqliteDatabase = SQLiteDatabase.openDatabase(databasePath, null, (SQLiteDatabase.OPEN_READWRITE)); } /** Close Database after use */ @Override public synchronized void close() { if ((sqliteDatabase != null) && sqliteDatabase.isOpen()) { sqliteDatabase.close(); } super.close(); } /** Get database instance for use */ public SQLiteDatabase getSqliteDatabase() { return sqliteDatabase; } /** Create new database if not present */ public void createDataBase() { SQLiteDatabase sqliteDatabase; if (databaseExists()) { int dbVersion = sharedPreferences.getInt(SP_KEY_DB_VER, 1); /* If different version then delete current database and copy the new one from assets*/ if (DATABASE_VERSION != dbVersion) { File dbFile = context.getDatabasePath(DB_NAME); boolean dbFileDeleted = dbFile.delete(); if (!dbFileDeleted) { Log.w(TAG, "Unable to update database"); } else { createDataBase(); } } } else { /* Database does not exists create blank database */ sqliteDatabase = this.getReadableDatabase(); sqliteDatabase.close(); SharedPreferences.Editor editor = sharedPreferences.edit(); editor.putInt(SP_KEY_DB_VER, DATABASE_VERSION); editor.apply(); copyDataBase(); } } /** Check Database if it exists */ private boolean databaseExists() { SQLiteDatabase sqliteDatabase = null; try { String databasePath = context.getDatabasePath(DB_NAME).toString(); sqliteDatabase = SQLiteDatabase.openDatabase(databasePath, null, SQLiteDatabase.OPEN_READONLY); } catch (SQLiteException e) { e.printStackTrace(); } if (sqliteDatabase != null) { sqliteDatabase.close(); } return sqliteDatabase != null; } /** * Copy existing database file in system */ public void copyDataBase() { int length; byte[] buffer = new byte[1024]; String databasePath = context.getDatabasePath(DB_NAME).toString(); try { InputStream databaseInputFile = this.context.getAssets().open(DB_NAME); OutputStream databaseOutputFile = new FileOutputStream(databasePath); while ((length = databaseInputFile.read(buffer)) > 0) { databaseOutputFile.write(buffer, 0, length); databaseOutputFile.flush(); } databaseInputFile.close(); databaseOutputFile.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } } 错误,我也尝试构建查询但是它会逐个插入记录,一小时后它只会插入 20k 记录数百万。有没有更好的方法来加载数据?

3 个答案:

答案 0 :(得分:1)

您将所有文件加载到内存中然后尝试逐行读取所有文件,这会导致性能和内存问题(堆空间等等)

您可以使用Scanner读取文件,这样就可以逐行读取而不加载到内存中。

FileInputStream inputStream = null;
Scanner sc = null;
try {
    inputStream = new FileInputStream(path);
    sc = new Scanner(inputStream, "UTF-8");
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        // db insert!
    }
    if (sc.ioException() != null) {
        throw sc.ioException();
    }
} finally {
    if (inputStream != null) {
        inputStream.close();
    }
    if (sc != null) {
        sc.close();
    }
}

否则使用Apache Commons IO

LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
try {
    while (it.hasNext()) {
        String line = it.nextLine();
        // do something with line
        // db insert
    }
} finally {
    LineIterator.closeQuietly(it);
}

对于enanched performance我建议你只打开一次连接

   // your logic....
   Connection con = getConnection();
   // reading file logic
   while (it.hasNext()) {
        String line = it.nextLine();
        // do something with line
        insertValues(con, line);
        // other logic
   }
   // checking exception etc
   } finally {
        if (inputStream != null) {
            inputStream.close();
        }
        if (sc != null) {
            sc.close();
        }

        if (con != null ) {
            con.close();
        }

    }

总结:

  1. 逐行读取文件而不加载到内存中
  2. 仅打开一次连接(或几次,而不是每次插入)。
  3. 将连接对象传递给插入方法
  4. 完成后关闭所有内容。
  5. 希望你理解......这些都是简单的例子,你需要根据你的需要来改变它们!

答案 1 :(得分:1)

不要读取整个文件 - 读取1000行,然后使用预准备语句插入它们并在此之后提交事务。然后读另一个1000,......

另外我认为Oracle有一个加载数据的特殊工具(google SQL * Loader和Data pump)。

答案 2 :(得分:-1)

让我看看我是否正确理解了您的需求:

你有一个大文件和文件中的每一行,你需要在数据库中插入更多的表。我理解正确吗?

如果是,您是否尝试使用Oracle的"SQL*Loader"工具? 我没有为这么大的文件测试它,但它可能是一个解决方案。 你可以从你的Java应用程序中调用它。