我有一个具有以下结构的Excel,我想将其转换为另一个结构:
实际文件比这复杂得多 - 但我创建了这个原理图来描述问题的本质。该文件目前有大约5K行,但预计包含大约50K-100K条目。因此,解决方案应该具有良好的性能。
我想到的是
我的问题是:
我做过的家庭作业:
我进行了谷歌搜索,试图阅读尽可能多的文章。我还在Stackoverflow中阅读了以下主题,但没有一个具有我正在寻找的答案
Converting Excel rows to columns (smarter than transpose)
答案 0 :(得分:1)
我认为您应该考虑将此应用程序转移到Access或其他数据库。本答案的其余部分假定目前无法做到这一点。
您正在考虑的方法有一个缓慢的循环:
With Worksheets("Input")
Cut
End With
With Worksheets("Output")
Paste
End With
我会:
如果您需要示例代码,我很乐意提供一些代码。我可以为您的示例表编写代码,但实际工作表的某些特性会为您提供更多有用的代码,而不需要额外的工作。
第2部分
你说“实际文件比这复杂得多 - 但我创建了这个原理图来描述问题的本质。”
我假设:
我创建了工作表输入并创建了20个数据行。我向下复制数据行3到22以创建5,000个数据行。我认为这是对数据的公平表示:
宏的输出在工作表输出中:
这是我相信你寻求的。我按照规定重新安排了数据。我已经复制了标题行的格式,列宽和数字格式。如果输入中有公式,则它们将是输出中的值。
对于5,000行,宏需要大约.1秒来复制数据,大约需要0.05秒来应用格式化。
在代码中我已经包含了评论来说明我正在做什么以及为什么我这样做但是没有很多评论解释VBA语句。例如,第一个语句是Option Explicit
。在VB帮助中很容易查找,或者您可以在互联网上搜索“Excel VBA Option Explicit”。如有必要,请回答问题。
希望这有帮助。
Option Explicit
Sub Reformat()
Dim CellHeaderColourBack As Long
Dim CellHeaderColourFore As Long
Dim CellHeaderBold As Boolean
Dim CellInValue() As Variant
Dim CellOutHeaderHAlign() As Long
Dim CellOutNumberFormat() As String
Dim CellOutValue() As Variant
Dim ColInCrnt As Long
Dim ColInCrnt2 As Long
Dim ColInMax As Long
Dim ColOutCrnt As Long
Dim ColOutMax As Long
Dim ColWidth() As Single
Dim NumRowsData As Long
Dim RowInCrnt As Long
Dim RowInMax As Long
Dim RowOutCrnt As Long
Dim RowOutMax As Long
Dim TimeStart As Single
' I use constants to define values that might change. For example, you have
' two header rows so the first data row is 3.
' "For RowCrnt = RowDataFirst to RowMax" instead of
' "For RowCrnt = 3 to RowMax"
' makes the code easier to understand and makes it easy to update the code
' if you add another header row.
Const RowDataFirst As Long = 3 ' First data row
Const NumNonYearCols As Long = 4 ' Number of columns not linked to a year
Const NumColsPerYear As Long = 2 ' Number of columns per year
TimeStart = Timer ' Seconds since midnight
With Worksheets("Input")
' There are several ways of identifying the last column and the last row.
' None work in every situation. I think this method should be satisfactory
' for your worksheet although there is a warning later about ColMax.
ColInMax = .Cells.SpecialCells(xlCellTypeLastCell).Column
RowInMax = .Cells.SpecialCells(xlCellTypeLastCell).Row
' Debug.Print output to the Immediate Window. I have left diagnostic
' outputs within the code. Delete once you have adapted the code to
' your requirements.
Debug.Print "ColInMax=" & ColInMax & " RowInMax=" & RowInMax
' I never did much programming in C++ or Java but I never used a language
' that did not have an Assert statement of some kind.
' A key assumption of the code is that the the number of columns is of the
' form: NumNonYearCols + NunYears * NumColsPerYear.
' The interpreter will stop on this statement if this assumption is untrue.
' If the interpreter does stop even though you think the assumption is true,
' you will probably have a stray value or formatted cell to the right of the
' main data table. Try deleting columns to the right of the data table.
' Alternatively, set ColInMax = NumNonYearCols + NumYears * NumColsPerYear
' so the extract ignores anything outside the data table.
Debug.Assert (ColInMax - NumNonYearCols) Mod NumColsPerYear = 0
' Load all values within the worksheet to the array CellValue.
CellInValue = .Range(.Cells(1, 1), .Cells(RowInMax, ColInMax)).Value
' CellInValue will now be a two dimensional array. Dimension 1 will be for
' rows and dimension 2 will be for columns. This is not conventional for
' arrays but matches the VBA for accessing cells.
' The lower bound for both dimensions will be 1.
' Record the formatting of cell A1 so this can be applied to all header
' cells in worksheet Output. If the formatting is more complicated than
' this, it will probably be easier to copy and paste the header rows from
' the input to the output worksheet.
With .Cells(1, 1)
CellHeaderColourBack = .Interior.Color
CellHeaderColourFore = .Font.Color
' Warning the bold state of a cell will be non-boolean if
' some characters are bold and some are not.
CellHeaderBold = .Font.Bold
End With
' Calculate number of columns in worksheet Output
ColOutMax = NumNonYearCols + 1 + NumColsPerYear
' Record column widths and number formats for first data row and horizontal
' alignment for last header row.
' The column widths will be applied to the relevant output columns
' The number formats will be applied to data cells in the relevant
' output column.
' The horizontal alignments will be applied to header cells in the
' relevant output column.
ReDim ColWidth(1 To ColOutMax)
ReDim CellOutNumberFormat(1 To ColOutMax)
ReDim CellOutHeaderHAlign(1 To ColOutMax)
ColOutCrnt = 1
' Non-year-linked columns
For ColInCrnt = 1 To NumNonYearCols
ColWidth(ColOutCrnt) = .Columns(ColInCrnt).ColumnWidth
CellOutNumberFormat(ColOutCrnt) = _
.Cells(RowDataFirst, ColInCrnt).NumberFormat
CellOutHeaderHAlign(ColOutCrnt) = _
.Cells(RowDataFirst - 1, ColInCrnt).HorizontalAlignment
ColOutCrnt = ColOutCrnt + 1
Next
' Year column
ColWidth(ColOutCrnt) = 5
CellOutNumberFormat(ColOutCrnt) = "General"
CellOutHeaderHAlign(ColOutCrnt) = xlRight
ColOutCrnt = ColOutCrnt + 1
' Year-linked columns
For ColInCrnt = NumNonYearCols + 1 To NumNonYearCols + NumColsPerYear
ColWidth(ColOutCrnt) = .Columns(ColInCrnt).ColumnWidth
CellOutNumberFormat(ColOutCrnt) = _
.Cells(RowDataFirst, ColInCrnt).NumberFormat
CellOutHeaderHAlign(ColOutCrnt) = _
.Cells(RowDataFirst - 1, ColInCrnt).HorizontalAlignment
ColOutCrnt = ColOutCrnt + 1
Next
End With
' I have now extracted everything I want from worksheet Input.
' Worksheet Output will have 1 data row per value in a Quantity column.
' Count these values.
NumRowsData = 0
For RowInCrnt = RowDataFirst To RowInMax
For ColInCrnt = NumNonYearCols + 1 To ColInMax Step NumColsPerYear
If CellInValue(RowInCrnt, ColInCrnt) <> "" Then
NumRowsData = NumRowsData + 1
End If
Next
Next
Debug.Print NumRowsData
' Size CellOutValue so it can hold all the data for Worksheet Output.
' ColOutMax = NumNonYearCols + 1 + NumColsPerYear ' Calculated earlier
RowOutMax = RowDataFirst - 1 + NumRowsData
ReDim CellOutValue(1 To RowOutMax, 1 To ColOutMax)
' Build new header rows.
' Copy header cells for non-year-linked columns
RowOutCrnt = 1
For RowInCrnt = 1 To RowDataFirst - 1
ColOutCrnt = 1
For ColInCrnt = 1 To NumNonYearCols
CellOutValue(RowOutCrnt, ColOutCrnt) = CellInValue(RowInCrnt, ColInCrnt)
ColOutCrnt = ColOutCrnt + 1
Next
RowOutCrnt = RowOutCrnt + 1
Next
' Create header for new column
CellOutValue(RowDataFirst - 1, ColOutCrnt) = "Year"
' Copy one set of year-linked column header cells
RowOutCrnt = 2 ' Row 1 holds year numbers
For RowInCrnt = 2 To RowDataFirst - 1
ColOutCrnt = NumNonYearCols + 2
For ColInCrnt = NumNonYearCols + 1 To NumNonYearCols + NumColsPerYear
CellOutValue(RowOutCrnt, ColOutCrnt) = _
CellInValue(RowInCrnt, ColInCrnt)
ColOutCrnt = ColOutCrnt + 1
Next
RowOutCrnt = RowOutCrnt + 1
Next
' Copy data
RowOutCrnt = RowDataFirst
For RowInCrnt = RowDataFirst To RowInMax
For ColInCrnt = NumNonYearCols + 1 To ColInMax Step NumColsPerYear
' This for-loop tracks the first column of each block of year columns
If CellInValue(RowInCrnt, ColInCrnt) <> "" Then
' There is data for this year for this customer/product
' Copy non-year-linked data
ColOutCrnt = 1
For ColInCrnt2 = 1 To NumNonYearCols
CellOutValue(RowOutCrnt, ColOutCrnt) = _
CellInValue(RowInCrnt, ColInCrnt2)
ColOutCrnt = ColOutCrnt + 1
Next
' Copy year
CellOutValue(RowOutCrnt, ColOutCrnt) = CellInValue(1, ColInCrnt)
ColOutCrnt = ColOutCrnt + 1
' Copy year-linked data
For ColInCrnt2 = ColInCrnt To ColInCrnt + NumColsPerYear - 1
CellOutValue(RowOutCrnt, ColOutCrnt) = _
CellInValue(RowInCrnt, ColInCrnt2)
ColOutCrnt = ColOutCrnt + 1
Next
RowOutCrnt = RowOutCrnt + 1
End If
Next
Next
With Worksheets("Output")
' Delete any existing value
.Cells.EntireRow.Delete
' Download contents of CellOutValue
.Range(.Cells(1, 1), .Cells(RowOutMax, ColOutMax)).Value = CellOutValue
'Set formatting. Selection formats from the input worksheet were saved at
' the beginning. Applying these formats to the output worksheet is not
' necessary but makes the process a little smoother.
For RowOutCrnt = 1 To RowDataFirst - 1
For ColOutCrnt = 1 To ColOutMax
With .Cells(RowOutCrnt, ColOutCrnt)
.Interior.Color = CellHeaderColourBack
.Font.Color = CellHeaderColourFore
.Font.Bold = CellHeaderBold
.HorizontalAlignment = CellOutHeaderHAlign(ColOutCrnt)
End With
Next
Next
For ColOutCrnt = 1 To ColOutMax
.Columns(ColOutCrnt).ColumnWidth = ColWidth(ColOutCrnt)
.Range(.Cells(RowDataFirst, ColOutCrnt), _
.Cells(RowOutMax, ColOutCrnt)).NumberFormat _
= CellOutNumberFormat(ColOutCrnt)
Next
End With
Debug.Print "Duration " & Timer - TimeStart
End Sub
答案 1 :(得分:0)
我会使用另一种方法:我只会添加一个缺少的列并删除不需要的列。
我的许多宏开始创建或导入包含所有数据的表,然后复制,排序,删除顶部或底部的不需要的行(如果排序足够智能,所有所需的行将被分组在一起),删除不需要的列,添加几列,格式化并重复所有工作表。