使用SAX Approch Open XML从行中获取所有单元格

时间:2015-09-03 23:00:57

标签: c# excel openxml sax

我刚刚使用Open XML SDK的DOM方法读取了一个大的xlsx文件。它工作正常;但是,这需要永远这样做。所以我想用SAX方法做同样的事情。但是,我没有在这一点上得到任何结果。 我在DOM方法中所做的是,对于工作簿中的每个工作表,我得到了工作表的名称。然后我假设第一行包含所有列名。接下来,我在运行中创建一个具有第一行中列出的所有属性的类。之后,我读了其余的行。对于每一行,我使用我动态创建的自定义类创建一个新对象。然后我遍历行中的每个单元格,用我得到的值填充对象。

以下是我用来执行我刚刚使用DOM方法描述的任务的代码。

public static List<Object> ConvertExcelArchiveToListObjects(string filePath)
    {
        ... 
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filePath, false))
        {
            WorkbookPart wbPart = spreadsheetDocument.WorkbookPart;
            Sheets theSheets = wbPart.Workbook.Sheets;

            SharedStringTablePart sstPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
            ...
            var sheets = wbPart.Workbook.Sheets.Cast<Sheet>().ToList();

            foreach (WorksheetPart worksheetpart in wbPart.WorksheetParts)
            {
                Worksheet worksheet = worksheetpart.Worksheet;

                string partRelationshipId = wbPart.GetIdOfPart(worksheetpart);
                var correspondingSheet = sheets.FirstOrDefault(
                    s => s.Id.HasValue && s.Id.Value == partRelationshipId);
                Debug.Assert(correspondingSheet != null);
                // Grab the sheet name
                string sheetName = correspondingSheet.GetAttribute("name", "").Value;

                ...

                dynamic expandoObjectClass = new ExpandoObject();
                List<Object> listObjectsCustomClasses  = new List<Object>();
                foreach (var dataRow in rowContent)
                {
                    Type generatedType = typeBuilder.CreateType();
                    object generatedObject = Activator.CreateInstance(generatedType);

                    PropertyInfo[] properties = generatedType.GetProperties();

                    int propertiesCounter = 0;

                    // Loop over the values that we will assign to the properties

                    var rowCells = dataRow.Descendants<Cell>();
                    var value = string.Empty;
                    foreach (var rowCell in rowCells)
                    {
                        if (rowCell.DataType != null
                            && rowCell.DataType.HasValue
                            && rowCell.DataType == CellValues.SharedString
                            && int.Parse(rowCell.CellValue.InnerText) < ssTable.ChildElements.Count)
                        {
                            value = ssTable.ChildElements[int.Parse(rowCell.CellValue.InnerText)].InnerText ?? string.Empty;
                        }
                        else
                        {
                            if (rowCell.CellValue != null && rowCell.CellValue.InnerText != null)
                            {
                                value = rowCell.CellValue.InnerText;
                            }
                            else
                            {
                                value = string.Empty;
                            }
                        }
                        properties[propertiesCounter].SetValue(generatedObject, value, null);
                        propertiesCounter++;
                    }
                    listObjectsCustomClasses.Add(generatedObject);
                }
                listObjects.Add(listObjectsCustomClasses);
            }
        }
        DateTime end = DateTime.UtcNow;
        Console.WriteLine("Measured time: " + (end - begin).TotalMinutes + " minutes.");
        return listObjects;
    }

但是,每当我读取大型xlsx文件(大小超过30 MB)时,上述方法都需要花费大量时间来执行。我已经编写了这段代码,至少可以获得行而不需要深入挖掘每行中的单元格。

public static List<Object> ConvertExcelArchiveToListObjectsSAXApproach(string filePath)
    {
        DateTime begin = DateTime.UtcNow;
        List<Object> listObjects = new List<Object>();
        using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filePath, false))
        {
            WorkbookPart wbPart = spreadsheetDocument.WorkbookPart;
            Sheets theSheets = wbPart.Workbook.Sheets;

            SharedStringTablePart sstPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
            SharedStringTable ssTable = null;
            if (sstPart != null)
                ssTable = sstPart.SharedStringTable;

            // Get the CellFormats for cells without defined data types
            WorkbookStylesPart workbookStylesPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<WorkbookStylesPart>().First();
            CellFormats cellFormats = (CellFormats)workbookStylesPart.Stylesheet.CellFormats;
            var sheets = wbPart.Workbook.Sheets.Cast<Sheet>().ToList();

            foreach (WorksheetPart worksheetpart in wbPart.WorksheetParts)
            {
                //Worksheet worksheet = worksheetpart.Worksheet;
                OpenXmlPartReader reader = new OpenXmlPartReader(worksheetpart);
                bool firstRow = false;

                while (reader.Read())
                {
                    if (reader.ElementType == typeof(Row))
                    {
                       ...
                    }

                    if (reader.ElementType != typeof(Worksheet)) // Dont' want to skip the contents of the worksheet
                        reader.Skip(); // Skip contents of any node before finding the first row.
                }    DateTime end = DateTime.UtcNow;
        Console.WriteLine("Measured time: " + (end - begin).TotalMinutes + " minutes.");
        return listObjects;
    }

但是,我在

中设置的断点
if (reader.ElementType == typeof(Row))
                    {
                       ...
                    } 

甚至没有被击中。关于我缺少什么的任何想法?谢谢!

1 个答案:

答案 0 :(得分:0)

您是否在线程Using OpenXmlReader中看到了代码。代码正在完成你想要做的事情。