我正在VB.NET中构建一个应用程序来读取Excel文件中的行并将它们填充到DataTable
。
dtRow = dataTable.NewRow()
Dim startTime As DateTime = DateTime.Now
dtRow("name") = suppliers.CellValue("A", rowCount)
/* SNIP - just more string retrieval */
dtRow("statistics") = suppliers.CellValue("P", rowCount)
dataTable.Rows.Add(dtRow)
Dim endTime As DateTime = DateTime.Now
Debug.Print(String.Format("Time elapsed to retrieve '{0}': {1} ms", rowCount, (endTime - startTime).ToString("fffffff")))
CellValue
是我自己创造的 - 但它是一个小功能,我已经测量了它的经过时间。这很快。
但是,当我打开10,000行Excel文件(填充相同数据)时,处理时间会慢得多。
3,000行:
Time elapsed to retrieve '2': 0510051 ms
Time elapsed to retrieve '3': 0500050 ms
Time elapsed to retrieve '4': 0340034 ms
Time elapsed to retrieve '5': 0350035 ms
Time elapsed to retrieve '6': 0340034 ms
Time elapsed to retrieve '7': 0340034 ms
Time elapsed to retrieve '8': 0350035 ms
6,000行:
Time elapsed to retrieve '2': 0710071 ms
Time elapsed to retrieve '3': 0760076 ms
Time elapsed to retrieve '4': 0620062 ms
Time elapsed to retrieve '5': 0670067 ms
Time elapsed to retrieve '6': 0750075 ms
Time elapsed to retrieve '7': 0750075 ms
Time elapsed to retrieve '8': 0700070 ms
10,000行:
Time elapsed to retrieve '2': 0920092 ms
Time elapsed to retrieve '3': 0920092 ms
Time elapsed to retrieve '4': 1790179 ms
Time elapsed to retrieve '5': 1810181 ms
Time elapsed to retrieve '6': 1930193 ms
Time elapsed to retrieve '7': 2240224 ms
Time elapsed to retrieve '8': 1820182 ms
为什么会这样?我能解决吗?
编辑:suppliers
是我创建的用于处理Excel文件的类,使用此构造函数:
Public Sub New(ByVal doc As SpreadsheetDocument, ByVal sheetName As String)
pWorkbookPart = doc.WorkbookPart
Dim sheet As Sheet = pWorkbookPart.Workbook.Descendants(Of Sheet).Where(Function(s) s.Name = sheetName).FirstOrDefault()
pWorksheetPart = CType(pWorkbookPart.GetPartById(sheet.Id), WorksheetPart)
pSharedStringTable = pWorkbookPart.GetPartsOfType(Of SharedStringTablePart).FirstOrDefault()
End Sub
CellValue
:
Public Function CellValue(ByVal column As String, ByVal row As Integer) As String
Dim cellAddress As String = column & row
Dim cell As Cell = pWorksheetPart.Worksheet.Descendants(Of Cell).Where(Function(c) c.CellReference = cellAddress).FirstOrDefault()
Dim index As Integer
Dim returnValue As String
If cell IsNot Nothing Then
If cell.DataType IsNot Nothing Then
index = Integer.Parse(cell.InnerText)
returnValue = pSharedStringTable.SharedStringTable.ElementAt(index).InnerText
Else
returnValue = CStr(cell.InnerText)
End If
End If
Return returnValue
End Function
答案 0 :(得分:6)
如果您的字符串表变得非常大,一个可能的问题是ElementAt
在遍历SharedStringTable
的情况下可能未被优化。由于此表对于您的处理是静态的,我建议删除该部分,而是使用List<string>
或数组存储它:
' Use this instead of pSharedStringTable
' Dim sharedStringTable As New List(Of String)
' Initialize your string table
sharedStringTable.AddRange( _
From xml In pSharedStringTable.SharedStringTable _
Select xml.InnerText)
' Now you can use sharedStringTable.ElementAt(index) and enjoy optimization
' Or you can use sharedStringTable(index)
另一个可能的问题是通过引用对单元格进行常量线性搜索。相反,你应该将其转换为字典:
' Dim cells As New Dictionary(Of String, Of Cell)
For Each cell In pWorksheetPart.Worksheet.Descendants(Of Cell)
cells.Add(cell.CellReference.InnerText, cell)
Next cell
' Only one round-trip to Excel for cells using this method
在每种情况下,你都会记忆时间,在这两种情况下我都认为这符合你的最佳利益:
' Revised lookup using data structures optimized for common access
If cells.TryGetValue(cellAddress, cell) Then
If cell.DataType IsNot Nothing Then
index = Integer.Parse(cell.InnerText)
returnValue = sharedStringTable(index)
Else
returnValue = CStr(cell.InnerText)
End If
End If
答案 1 :(得分:4)
这条线看起来很可疑:
Dim cell As Cell = pWorksheetPart.Worksheet.Descendants(Of Cell).Where(Function(c) c.CellReference = cellAddress).FirstOrDefault()
如果.Where()条件针对电子表格中的每个单元格执行。随着行数的增加,单元格地址比较的数量增加(行x列)。即使单元格参考比较操作非常简单,它也会加快。
如果显示的OpenXML或Workbook类不提供方便的x,y单元寻址,则可能必须创建自己的索引。对所有单元格进行一次传递以将它们添加到您自己的列列表中,然后您可以使用放弃来按x,y进行索引。 x =列列表中列列表的索引,y =索引到列列表中以获取单元格。