从C#中读取一个巨大的Excel列

时间:2013-04-06 05:28:37

标签: c# multithreading excel spreadsheet

我有一个Excel(可能是2010年或2013年不知道以后这可能是一个问题)文档有四列。前三列存储的电话号码基本上是一个包含10个或更多字符的字符串。只有四列和永久存储1,2,3或4,它是一个类别。我需要检查 A 列中的每个数字是否出现在 B 列和 C 列中,否则我认为在阅读所有Excel单元格中列和存储在列表中(尚未实现,因为我将在下面解释该问题)。为此,我制作了这段代码:

private void btnCargarExcel_Click(object sender, EventArgs e)
        {
            if (this.openFileDialog1.ShowDialog() == DialogResult.OK)
            {

                if (System.IO.File.Exists(openFileDialog1.FileName))
                {
                    filePath.Text = openFileDialog1.FileName.ToString();

                    Excel.Application xlApp;
                    Excel.Workbook xlWorkBook;
                    Excel.Worksheet xlWorkSheet;
                    Excel.Range range;

                    string str;

                    int rCnt = 0;

                    xlApp = new Microsoft.Office.Interop.Excel.Application();
                    xlWorkBook = xlApp.Workbooks.Open(openFileDialog1.FileName, 0, true, 5, "", "", true, Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
                    xlWorkSheet = (Excel.Worksheet)xlWorkBook.Worksheets.get_Item(1);

                    range = xlWorkSheet.UsedRange;

                    for (rCnt = 1; rCnt <= range.Rows.Count; rCnt++)
                    {
                        str = (range.Cells[rCnt, 1] as Excel.Range).Value2.ToString();
                        //bd.Add(cleanString(str));

                        bd.Add(cleanString(str, 10));
                    }

                    for (rCnt = 1; rCnt <= range.Rows.Count; rCnt++)
                    {
                        str = (range.Cells[rCnt, 2] as Excel.Range).Value2.ToString();
                        //bd.Add(cleanString(str));

                        bl.Add(cleanString(str, 10));
                    }

                    for (rCnt = 1; rCnt <= range.Rows.Count; rCnt++)
                    {
                        str = (range.Cells[rCnt, 3] as Excel.Range).Value2.ToString();
                        //bd.Add(cleanString(str));

                        cm.Add(cleanString(str, 10));
                    }

                    nrosProcesados.Text = bd.Count().ToString();
                    listBox1.DataSource = bd;

                    noProcesadosBL.Text = bl.Count().ToString();
                    listBox2.DataSource = bl;

                    noProcesadosCM.Text = cm.Count().ToString();
                    listBox3.DataSource = cm;

                    xlWorkBook.Close(true, null, null);
                    xlApp.Quit();

                    releaseObject(xlWorkSheet);
                    releaseObject(xlWorkBook);
                    releaseObject(xlApp);
                }
                else
                {
                    MessageBox.Show("No se pudo abrir el fichero!");
                    System.Runtime.InteropServices.Marshal.ReleaseComObject(appExcel);
                    appExcel = null;
                    System.Windows.Forms.Application.Exit();
                }
            }
        }

所以我在列中迭代单元格,并在进行一些字符串更改后将每个数字存储在列表中,如代码中所示。这里的问题是列A有797340个单元格,列B有91617个单元格,列C有95891个单元格,所以如果我运行应用程序,加载Excel并等待我的PC挂出(即使有12GB的RAM和Core i3处理器)我需要打开任务管理器并结束任务。什么是最好的解决方案,以获得我想要的(只留下没有重复的数字),而不是挂出我的电脑?在每个循环中,将在单独的线程中对事物进行细分(我不太了解这个,因为我从C#开始,所以任何帮助都会受到赞赏)?你对这个话题有什么看法?

编辑:添加新的干净方法

所以在阅读和阅读并获得一些成员的帮助之后,我改进了一些代码,但现在我遇到了另一个问题(在代码下方注释)。现在看代码:

// this goes first when I declare vars
public static System.Array objRowAValues;

// this goes in action when I click the button (I leave only relevant part)
Excel.Application xlApp;
Excel.Workbook xlWorkBook;
Excel.Worksheet xlWorkSheet;
Excel.Range range, rngARowLast;

string str;
int rCnt = 0;

long lastACell, fullRow;

xlApp = new Microsoft.Office.Interop.Excel.Application();
xlWorkBook = xlApp.Workbooks.Open(openFileDialog1.FileName, 0, true, 5, "", "", true, Microsoft.Office.Interop.Excel.XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
xlWorkSheet = (Excel.Worksheet) xlWorkBook.Worksheets.get_Item(1);

range = xlWorkSheet.UsedRange;

fullRow = xlWorkSheet.Rows.Count;
lastACell = xlWorkSheet.Cells[fullRow, 1].End(Excel.XlDirection.xlUp).Row;
rngARowLast = xlWorkSheet.get_Range("A1", "A" + lastACell);
objRowAValues = (System.Array) rngARowLast.Cells.Value;

现在因为我将使用来自objRowAValues的值填充ListBox,而ListBox只接受List作为DataSource,然后我需要将objRowAValues转换为字符串List。我试试this,但它不适合我。有什么帮助吗?

1 个答案:

答案 0 :(得分:1)

不幸的是,我更像是一个VB.NET人 - 所以我为你转换了一些代码。 我希望这开箱即用 - 我这里不使用这种工具,所以我无法测试它。

public void test()
{
    object[,] RaWData = null;

    dynamic range = xlWorkSheet.UsedRange;

    //i am unsure here about the correct order - I do not work with excel at Work, so you might have to change the following lange, if columns needs to be before rows or so
    RaWData = range.value2;

    //I am using a list here, because Lists are a lot easier to work with then simple arrays
    List<List<string>> RealData = new List<List<string>>();

    //start at 1  because the excel-delivered array do not have values at index 0 - this is the only 1-based array you will ever encounter in .net
    for (x = 1; x <= Information.UBound(RaWData, 1); x++) {
        List<string> templist = new List<string>();
        for (y = 1; y <= Information.UBound(RaWData, 2); y++) {
            templist.Add(RaWData[x, y].ToString());
        }
        RealData.Add(templist);
    }

    //you should be finished here...
}