从文本文件加载数据的最快方法,然后将其存储到数据库中

时间:2011-02-11 13:56:23

标签: c# ms-access import text-files

我有问题。

我正在开发一个项目,但我陷入了这个部分:

我想从文本文件加载数据并将其存储到数据库访问中 事物是每个文本文件中的数据大约12.000行数据 每个文本文件大约需要10分钟来处理它..

注意:在存储数据之前,我将每行数据与文本文件分开并将其放入字符串然后检查数据是否已经在数据库中。 如果在数据库中我更新它。如果没有,那么我使用insert语句..

我正在使用C#来开发这个程序? 有没有最快的方法来加载和存储这些数据?

更新:

这是我的代码,我希望它有助于理解我的问题:

    using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.IO;
using System.Collections;
using System.Data.OleDb;

namespace DAF
{
    public partial class FrontForm : Form
    {
        public Boolean status;

        public FrontForm()
        {
            InitializeComponent();

            //define location of the database
            string connection = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Users\PC\Desktop\Graduation Project\Research\DAF\Data\DAFLogisticDepartment.mdb";

            //define location of the text file data
            DirectoryInfo di = new DirectoryInfo(@"C:\Users\PC\Desktop\Graduation Project\Research\DAF\Data\");
            FileInfo[] fiarr = di.GetFiles("*.txt");


            //define connection to database
            OleDbConnection con = new OleDbConnection(connection);
            String query;
            OleDbDataReader rdr = null;

            con.Open();
            //get all table from database
            OleDbCommand cmd = con.CreateCommand();
            DataTable dt = con.GetSchema("tables");
            DataRow[] dttable = dt.Select();
            con.Close();

            //read each new textfile inside the folder
            foreach (FileInfo fri in fiarr)
            {
                StreamReader sr = new StreamReader(fri.FullName, System.Text.Encoding.Default);
                String line;
                String tabledbs, dbsName;

                while ((line = sr.ReadLine()) != null)
                {
                    String VRSD, locationID, truckID, yearIn, yearOut, weekIn, weekOut, dayIn, dayOut, timeIn, timeOut, route;
                    int plantID;

                    //process each line of data and put into each variable
                    VRSD = line.Substring(0, 4).Trim();
                    plantID = Convert.ToInt32(line.Substring(4, 1).Trim());
                    locationID = line.Substring(5, 4).Trim();
                    truckID = line.Substring(24, 5).Trim();
                    yearIn = line.Substring(32, 4).Trim();
                    weekIn = line.Substring(36, 2).Trim();
                    dayIn = line.Substring(38, 1).Trim();
                    timeIn = line.Substring(39, 8).Trim();
                    yearOut = line.Substring(47, 4).Trim();
                    weekOut = line.Substring(51, 2).Trim();
                    dayOut = line.Substring(53, 1).Trim();
                    timeOut = line.Substring(54, 8).Trim();
                    route = line.Substring(64, 2).Trim();

                    //make database name
                    dbsName = plantID + locationID;

                    con.Open();
                    //check if the table exist in database
                    for (int i = 0; i < dttable.Length - 9; i++)
                    {
                        tabledbs = dttable[i]["TABLE_NAME"].ToString();
                        ArrayList indexlist = new ArrayList();

                        if (tabledbs == dbsName)
                        {
                            //if the table exist, status = true
                            status = true;
                            break;
                        }
                    }
                    con.Close();

                    con.Open();

                    if (status == true)
                    {

                        try
                        {        
                            //if the data not in the system, insert statement
                            query = @"insert into " + plantID + locationID + " values('" + VRSD.ToString() + "'," + plantID + ",'" + locationID + "','" + truckID + "','" + yearIn + "','" + weekIn + "','" + dayIn + "','" + timeIn + "','" + yearOut + "','" + weekOut + "','" + dayOut + "','" + timeOut + "')";
                            cmd = new OleDbCommand(query, con);
                            rdr = cmd.ExecuteReader();
                            con.Close();
                        }
                        catch
                        {
                            //if the data in the system, update statement
                            query = @"update " + dbsName + " set YearIn='" + yearIn + "', YearOut='" + yearOut + "', WeekIn='" + weekIn + "', WeekOut='" + weekOut + "', DayIn='" + dayIn + "', DayOut='" + dayOut + "', TimeIn='" + timeIn + "', TimeOut='" + timeOut + "' where LocationID='" + locationID + "' and PlantID=" + plantID + "";
                            cmd = new OleDbCommand(query, con);
                            rdr = cmd.ExecuteReader();
                            con.Close();
                        }

                    }
                    else
                    {
                        //create new table
                        string attribute = "VRSD String,PlantID Integer, LocationID String, TruckID String," +
                                            "YearIn String, WeekIn String, DayIn String, TimeIn String," +
                                            "YearOut String, WeekOut String, DayOut String, TimeOut String";

                        query = @"CREATE TABLE " + plantID + locationID + "(" + attribute + ")";
                        cmd = new OleDbCommand(query, con);
                        cmd.ExecuteNonQuery();

                        //insert the data
                        query = @"insert into " + plantID + locationID + " values('" + VRSD.ToString() + "'," + plantID + ",'" + locationID + "','" + truckID + "','" + yearIn + "','" + weekIn + "','" + dayIn + "','" + timeIn + "','" + yearOut + "','" + weekOut + "','" + dayOut + "','" + timeOut + "')";
                        cmd = new OleDbCommand(query, con);
                        rdr = cmd.ExecuteReader();
                        con.Close();
                    }

                    status = false;
                }
                sr.Close();

                //after the text file load into database, the text file moved to history folder
                MessageBox.Show(fri.FullName.ToString(), "File Manager", MessageBoxButtons.OK);
                fri.MoveTo(@"C:\Users\PC\Desktop\Graduation Project\Research\DAF\Data\History\" + fri.Name.ToString() + ".txt");
            }
        }

        private void button2_Click(object sender, EventArgs e)
        {
            StandardReport sr = new StandardReport();
            sr.Show();
        }

        private void FrontForm_Load(object sender, EventArgs e)
        {

        }
    }
}

4 个答案:

答案 0 :(得分:2)

这里的大时间杀手将是你正在使用的数据库连接的数量 - 尝试构建一个内存中的命令列表(与从文件中读取数据相比,每个对象几乎不需要花费时间),并且你已经建立了你的列表,通过一个连接执行它们。打开每个连接需要花费一些时间,而且这样做的次数远远超过需要。编辑 - 实际上注意到每个文件每行打开/关闭2个连接!


目前(为清晰起见,伪代码):

For each file (x6)

   Load file from stream

   For each line in file (x12k)

     Read data from line

     Open database connection (happens 72k times)
     Check whether table exists
     Close connection

     Open connection (x72k)
     Try to insert record
     If inserting fails, update existing record
     Close connection

     Next line

   Close filestream

Next file

建议:(强烈建议你考虑动态添加表格的含义,它通常不是一个好的解决方案,但如果强加给你,你可能别无选择)

Create an in-memory list of commands 
  (or list of custom objects with property for each command type, create 
  table,insert,update)    

For each file (x6)

   Load file from stream

   For each line in file (x12k)

      Read data from line (all happens 72k times, but no external connections per line)

      Write your create table command
      Write your insert command
      Write your update command
      Add to relevent command lists

   Next Line

   Close filestream

Next File

Open database connection (x1)

For each command in your list
   Apply suitable logic as to whether command needs to execute
   Execute command if applicable
Next command

Close database connection

答案 1 :(得分:1)

为什么不尝试创建和使用SSIS包?它非常擅长这种东西,具有出色的工具,并且从代码中使用起来非常简单

http://msdn.microsoft.com/en-us/library/ms141026.aspx

http://blogs.msdn.com/b/michen/archive/2007/03/22/running-ssis-package-programmatically.aspx

答案 2 :(得分:1)

您可以使用查询使用Jet驱动程序为文本插入不匹配的记录。

SELECT a.* INTO NewTable FROM 
(SELECT * From [Text;DSN=Import Link Specification;FMT=Delimited;HDR=NO;IMEX=2;CharacterSet=850;DATABASE=C:\Docs].[Import.txt]) As A
LEFT JOIN OldTable ON a.Key=OldTable.Key
WHERE a.Key Is Null

修改

我想知道为什么你没有包含所有植物和位置的主表。然后,您可以将所有文件插入到临时表中,并相应地从temp追加或更新。

foreach (FileInfo fri in fiarr)
    {
    string s = "[Text;DSN=Test Spec;"
         + "FMT=Fixed;HDR=Yes;IMEX=2;CharacterSet=850;DATABASE=" 
         + fri.DirectoryName + "].["
         + fri.Name + "]";

    query = "INSERT INTO Temp SELECT * FROM " + s;

    cmd.ExecuteNonQuery();
    }

您似乎使用固定长度格式,因此DSN=Test Spec是通过以固定宽度格式导出文件然后使用“高级”按钮保存规范来创建Access规范。

答案 3 :(得分:0)

这里的一个问题可能是您使用SINGLE SQL EXECUTE语句逐行插入每个记录。

另一种解决方案是:

  1. 将文本文件读入字符串缓冲区(20.000行)
  2. 创建DataTable对象
  3. 在循环中,将每一行插入DataTable对象。
  4. 最后,使用DataAdapter,将DataTable写回数据库。