我刚刚使用Open XML SDK的DOM方法读取了一个大的xlsx文件。它工作正常;但是,这需要永远这样做。所以我想用SAX方法做同样的事情。但是,我没有在这一点上得到任何结果。 我在DOM方法中所做的是,对于工作簿中的每个工作表,我得到了工作表的名称。然后我假设第一行包含所有列名。接下来,我在运行中创建一个具有第一行中列出的所有属性的类。之后,我读了其余的行。对于每一行,我使用我动态创建的自定义类创建一个新对象。然后我遍历行中的每个单元格,用我得到的值填充对象。
以下是我用来执行我刚刚使用DOM方法描述的任务的代码。
public static List<Object> ConvertExcelArchiveToListObjects(string filePath)
{
...
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart wbPart = spreadsheetDocument.WorkbookPart;
Sheets theSheets = wbPart.Workbook.Sheets;
SharedStringTablePart sstPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
...
var sheets = wbPart.Workbook.Sheets.Cast<Sheet>().ToList();
foreach (WorksheetPart worksheetpart in wbPart.WorksheetParts)
{
Worksheet worksheet = worksheetpart.Worksheet;
string partRelationshipId = wbPart.GetIdOfPart(worksheetpart);
var correspondingSheet = sheets.FirstOrDefault(
s => s.Id.HasValue && s.Id.Value == partRelationshipId);
Debug.Assert(correspondingSheet != null);
// Grab the sheet name
string sheetName = correspondingSheet.GetAttribute("name", "").Value;
...
dynamic expandoObjectClass = new ExpandoObject();
List<Object> listObjectsCustomClasses = new List<Object>();
foreach (var dataRow in rowContent)
{
Type generatedType = typeBuilder.CreateType();
object generatedObject = Activator.CreateInstance(generatedType);
PropertyInfo[] properties = generatedType.GetProperties();
int propertiesCounter = 0;
// Loop over the values that we will assign to the properties
var rowCells = dataRow.Descendants<Cell>();
var value = string.Empty;
foreach (var rowCell in rowCells)
{
if (rowCell.DataType != null
&& rowCell.DataType.HasValue
&& rowCell.DataType == CellValues.SharedString
&& int.Parse(rowCell.CellValue.InnerText) < ssTable.ChildElements.Count)
{
value = ssTable.ChildElements[int.Parse(rowCell.CellValue.InnerText)].InnerText ?? string.Empty;
}
else
{
if (rowCell.CellValue != null && rowCell.CellValue.InnerText != null)
{
value = rowCell.CellValue.InnerText;
}
else
{
value = string.Empty;
}
}
properties[propertiesCounter].SetValue(generatedObject, value, null);
propertiesCounter++;
}
listObjectsCustomClasses.Add(generatedObject);
}
listObjects.Add(listObjectsCustomClasses);
}
}
DateTime end = DateTime.UtcNow;
Console.WriteLine("Measured time: " + (end - begin).TotalMinutes + " minutes.");
return listObjects;
}
但是,每当我读取大型xlsx文件(大小超过30 MB)时,上述方法都需要花费大量时间来执行。我已经编写了这段代码,至少可以获得行而不需要深入挖掘每行中的单元格。
public static List<Object> ConvertExcelArchiveToListObjectsSAXApproach(string filePath)
{
DateTime begin = DateTime.UtcNow;
List<Object> listObjects = new List<Object>();
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart wbPart = spreadsheetDocument.WorkbookPart;
Sheets theSheets = wbPart.Workbook.Sheets;
SharedStringTablePart sstPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
SharedStringTable ssTable = null;
if (sstPart != null)
ssTable = sstPart.SharedStringTable;
// Get the CellFormats for cells without defined data types
WorkbookStylesPart workbookStylesPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<WorkbookStylesPart>().First();
CellFormats cellFormats = (CellFormats)workbookStylesPart.Stylesheet.CellFormats;
var sheets = wbPart.Workbook.Sheets.Cast<Sheet>().ToList();
foreach (WorksheetPart worksheetpart in wbPart.WorksheetParts)
{
//Worksheet worksheet = worksheetpart.Worksheet;
OpenXmlPartReader reader = new OpenXmlPartReader(worksheetpart);
bool firstRow = false;
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
...
}
if (reader.ElementType != typeof(Worksheet)) // Dont' want to skip the contents of the worksheet
reader.Skip(); // Skip contents of any node before finding the first row.
} DateTime end = DateTime.UtcNow;
Console.WriteLine("Measured time: " + (end - begin).TotalMinutes + " minutes.");
return listObjects;
}
但是,我在
中设置的断点if (reader.ElementType == typeof(Row))
{
...
}
甚至没有被击中。关于我缺少什么的任何想法?谢谢!