Question

我有一个Access数据库，其中包含一个大约3,500,000行的表。其中一个列是10位数的“Phone_No”＆＃39;。在这么大的桌子上进行搜索需要太长时间。所以我决定根据＆＃39; Phone_No＆＃39;的前5位将这一个表分成多个表。

例如，如果我有一个＆＃39; Phone_No＆＃39;值为1234500000然后我将创建一个名为12345的新表，并在其中插入所有具有＆＃39; Phone_No＆＃39;从12345开始，即1234500001,1234500002等值

这样一来，如果我必须搜索“电话号码”。我只需要做一个＆＃39; SELECT * FROM 12345 WHERE Phone_No = 1234500000＆＃39;。这会将我的搜索范围缩小到相当小的数据集。

示例代码如下所示

        int iRow = 0;
        int iRowCount;

        string strIndex;
        string strOutputTableName;

        DataTable dtOutput;

        DataRow dr;

        try
        {
            // Get staging row count
            iRowCount = dtStaging.Rows.Count;
            dsOutput = new DataSet();

            // Iterate each row for splitting
            while (dtStaging.Rows.Count > 0)
            {
                // Increment row count
                iRow++;

                // Get current Datarow
                dr = dtStaging.Rows[0];

                // Get column to be indexed by splitting
                strIndex = dr["Phone_No"].ToString();

                // Set output table name
                strOutputTableName = strIndex.Substring(0, 5);

                try
                {
                    // Create new datatable 
                    dtOutput = new DataTable(strOutputTableName);

                    // Add columns
                    dtOutput.Columns.Add("Phone_No", typeof(string));
                    dtOutput.Columns.Add("Subscriber_Name", typeof(string));
                    dtOutput.Columns.Add("Address_Line1", typeof(string));
                    dtOutput.Columns.Add("Address_Line2", typeof(string));
                    dtOutput.Columns.Add("Address_Line3", typeof(string));
                    dtOutput.Columns.Add("City", typeof(string));
                    dtOutput.Columns.Add("Pin_Code", typeof(string));
                    dtOutput.Columns.Add("SIM_Activation_Date", typeof(DateTime));

                    // Add datatable to dataset
                    dsOutput.Tables.Add(dtOutput);
                }
                // catch table already exists error and proceed
                catch
                {

                }

                // Import current datarow from staging to output table
                dsOutput.Tables[strOutputTableName].ImportRow(dr);

                // Report progress
                bwSplit.ReportProgress((iRow * 100) / iRowCount);

                // Remove current datarow to release some memory
                dtStaging.Rows.RemoveAt(0);
            }
        }
        catch
        {
            throw;
        }
        finally
        {
            dtStaging = null;
        }

准备好数据集dsOutput后，我将其插入另一个Access数据库文件中。问题是这种分裂过程需要很长时间。我正在寻找上述代码的优化版本。或者任何不同的方法可以使分裂过程更快。

Answer 1

正如对该问题的几条评论中所述，索引是性能问题时首先要考虑的问题。

我刚刚在网络共享上运行了170MB .accdb的测试。该文件包含一个包含350万行的表，其主键是10位数的电话号码（Text(10)）。执行SELECT以检索包含特定电话号码的行，例如，

SELECT CompanyName FROM PhoneNumbers WHERE PhoneNumber='4036854444'

持续不到0.1秒。对我来说，这似乎不足以保证任何类型的方案将数据分成多个表。

有关使用Access数据库的应用程序的性能注意事项的更多信息，请参阅

C# program querying an Access database in a network folder takes longer than querying a local copy

优化搜索超过3,500,000行的数据库表

1 个答案: