excel xlsx文件解析 - 使用koogra

时间:2012-11-28 15:18:30

标签: c# excel .net-4.0 xml-parsing text-parsing

尝试使用git hub的几个包之后,尝试解析/处理这个相当大的excel文档。 我尝试的每一种方法都在out of memory上抛出异常。

我还在google ing ,并发现这个名为koogra的GNU库似乎只是我认为适合这项工作的那个,不能打扰太多并继续我正在为这部分项目耗尽时间。

我现在所获得的代码正在通过“内存不足”问题的部分,

所以唯一剩下的就是如何正确解析Excel文档以便可以提取说一种字典集合键是一列而值是另一列。

this is the file in question

这是我到目前为止的代码

var path = Path.Combine(Environment.CurrentDirectory, "tst.xlsx");
Net.SourceForge.Koogra.Excel2007.Workbook xcel = new Net.SourceForge.Koogra.Excel2007.Workbook(path);
var ss = xcel.GetWorksheets();

1 个答案:

答案 0 :(得分:5)

通过更多发现.... google ing ... 2007年使用的第一行(xlsx

第二行适用于xls版本

        Net.SourceForge.Koogra.IWorkbook genericWB = Net.SourceForge.Koogra.WorkbookFactory.GetExcel2007Reader("tst.xlsx");

        //genericWB = Net.SourceForge.Koogra.WorkbookFactory.GetExcelBIFFReader("some.xls");

        Net.SourceForge.Koogra.IWorksheet genericWS = genericWB.Worksheets.GetWorksheetByIndex(0);

        for (uint r = genericWS.FirstRow; r <= genericWS.LastRow; ++r)
        {
            Net.SourceForge.Koogra.IRow row = genericWS.Rows.GetRow(r);

            for (uint c = genericWS.FirstCol; c <= genericWS.LastCol; ++c)
            {
                // raw value
                Console.WriteLine(row.GetCell(c).Value);

                // formatted value
                Console.WriteLine(row.GetCell(c).GetFormattedValue());
            }
        }

我希望我帮助那些遇到同样“内存不足”问题的人...... 享受

对上面代码的小更新

好的..我已经玩了一点,所以只要它与文件的内容有关 图表根据Unique IP排名,当前代码为

            //place source file within your current:
            //project directory\bin\debug and you should find extracted file next to the source file 
            var pathtoRead = Path.Combine(Environment.CurrentDirectory, "tst.xlsx");
            var pathtoWrite = Path.Combine(Environment.CurrentDirectory, "tst.txt");

            Net.SourceForge.Koogra.IWorkbook genericWB = Net.SourceForge.Koogra.WorkbookFactory.GetExcel2007Reader(pathtoRead);
            Net.SourceForge.Koogra.IWorksheet genericWS = genericWB.Worksheets.GetWorksheetByIndex(0);
            StringBuilder SbXls = new StringBuilder();
            for (uint r = genericWS.FirstRow; r <= genericWS.LastRow; ++r)
            {
                Net.SourceForge.Koogra.IRow row = genericWS.Rows.GetRow(r);
                string LineEnding = string.Empty;
                for (uint ColCount = genericWS.FirstCol; ColCount <= genericWS.LastCol; ++ColCount)
                {

                    var formated = row.GetCell(ColCount).GetFormattedValue();
                    if (ColCount == 1)
                        LineEnding = Environment.NewLine;
                    else if (ColCount == 0)
                        LineEnding = "\t";
                    if (ColCount > 1 == false)
                        SbXls.Append(string.Concat(formated, LineEnding));
                }
            }
            File.WriteAllText(pathtoWrite, SbXls.ToString());