我的问题类似但比这个帖子How to Consolidate Data from Multiple Excel Columns All into One Column更复杂。
以下是excel示例
Date Measure1 A B Date Measure2 A B C Date.....
11/11/11 1234 1 2 11/12/12 5678 1 3 3 12/12/12 ...
12/11/12 234 34 234 12/12/13 345 342 23 33 12/12/13 ...
........
excel中有数百列。一个日期列后跟一个测量列,然后是其他一些列。 现在我只想要日期列,度量名称列和值列。 结果excel文件应该是
Date Measure Name Value
11/11/11 Measure1 1234
11/12/12 Measure2 5678
12/12/12 ....
....
12/11/12 Measure1 234
12/12/13 Measure2 123
我怎么能通过VBA实现它?由于我有这样的数千个文件,VBA似乎是整合这些文件并加载到数据库中的最佳方式。
我总是得到
Run-time error '1004'
Application -defined or object -defined eror"
at
w.Sheets("DataSort").Range("A1").Resize(k, UBound(Arr2, 2)) = Arr2
这是我的代码
Sub convertExcel()
Dim Arr1, Arr2()
Dim Rnum As Integer, Cnum As Integer, Tnum As Integer
Dim i As Integer, j As Integer, k As Integer
'Rnum = row number; Cnum = column number; Tnum as total number
Application.ScreenUpdating = False
Set w = Workbooks.Open("FileNAME~~~~")
Rnum = w.Sheets("Data").Cells(Rows.Count, 1).End(xlUp).Row
Cnum=208
Tnum = Rnum * Cnum / 2
w.Sheets.Add.Name = "DataSort"
Arr1 = Range("A1:GZ" & Rnum)
ReDim Arr2(1 To Tnum, 1 To 3)
For j = 2 To Cnum
If w.Sheets("Data").Cells(1, j) = "Date" Then
For i = 2 To Rnum
If Arr1(i, j) <> "" Then
k = k + 1:
Arr2(k, 1) = Arr1(i, j)
Arr2(k, 2) = Arr1(1, j)
Arr2(k, 3) = Arr1(i, j + 1)
End If
Next
End If
Next
w.Sheets("DataSort").Range("A1").Resize(k, UBound(Arr2, 2)) = Arr2
w.Close True
Application.ScreenUpdating = True
End Sub
答案 0 :(得分:1)
由于今天我手上有很多时间,所以我决定在这里投入一些时间。我发现它有点挑战性,但最后,它只是恰当的事件排序。
以下是我采用的逻辑:
Date
和非MeasureX
列。Measure
的列名存储在字典中(完全不必要,但是,嘿,它很快)作为键。请仔细阅读代码中的所有评论。另外,请注意我的设置如下。最后,在您的工作簿副本上进行测试。
设置向上:强>
在Sheet2
中,我有一个 unabridged 数据集,大致从您的示例中复制了1508列和1500行数据,不包括标题。删除不需要的列后,数据将减少到734列和1500行数据。在测试时,删除大约需要11-13秒。您的里程可能会有所不同。
使用这个过滤后的数据,使用第二个字典处理它需要大约8-9秒才能完成。整个过程基本上完成约20秒。
<强>截图:强>
Sheet2(包含原始数据的工作表):
Sheet3(输出表):
<强> 代码: 强>
Sub KamehameWave()
Dim Sht2 As Worksheet, Sht3 As Worksheet
Dim Dict As Object, Cell As Range
Dim Dict2 As Object, Cell2 As Range
Dim RngToDelete As Range
Set Sht2 = ThisWorkbook.Sheets("Sheet2") 'Modify accordingly.
Set Sht3 = ThisWorkbook.Sheets("Sheet3") 'Modify accordingly.
Application.ScreenUpdating = False
With Sht2
'-----------------------------------BK201's Notes-----------------------------------'
' The following block will delete unneeded columns. Basically, it will only keep '
' columns that either have "Date" or "MeasureX" in their headers. All else will be '
' discarded. As said in the post, do this on a copy of your worksheet. '
'-----------------------------------BK201's Notes-----------------------------------'
Start = Timer()
For Each Cell In .Rows(1).Cells
If InStr(1, Cell.Value, "Date") = 0 And InStr(1, Cell.Value, "Measure") = 0 Then
If Not RngToDelete Is Nothing Then
Set RngToDelete = Union(RngToDelete, .Columns(Cell.Column))
Else
Set RngToDelete = .Columns(Cell.Column)
End If
End If
Next Cell
RngToDelete.Delete
Debug.Print Timer() - Start
Start = Timer()
'-----------------------------------BK201's Notes-----------------------------------'
' The following block will create a dictionary and store all the names of columns '
' with "Measure" in them. This is just so you have a reference. An array or a '
' collection will do as well. I prefer to use this though as I find it easier. '
'-----------------------------------BK201's Notes-----------------------------------'
Set Dict = CreateObject("Scripting.Dictionary")
For Each Cell In .Rows(1).Cells
CheckIfMeasure = InStr(1, Cell.Value, "Measure")
If CheckIfMeasure > 0 Then
If Not Dict.Exists(Cell.Value) And Not IsEmpty(Cell.Value) Then
Dict.Add Cell.Value, Empty
End If
End If
Next Cell
'-----------------------------------BK201's Notes-----------------------------------'
' What we'll do next is to iterate over each "MeasureX" column. We'll iterate over '
' the values on these columns and store them in a *second* dictionary, with their '
' respective dates being the keys. '
'-----------------------------------BK201's Notes-----------------------------------'
For Each Key In Dict
MColIndex = Application.Match(Key, .Rows(1), 0)
MColLRow = .Cells(Rows.Count, MColIndex).End(xlUp).Row
Set MCol = .Range(.Cells(2, MColIndex), .Cells(MColLRow, MColIndex))
Set Dict2 = CreateObject("Scripting.Dictionary")
For Each Cell2 In MCol
If Not Dict2.Exists(Cell2.Value) And Not IsEmpty(Cell2.Value) Then
Dict2.Add Cell2.Offset(0, -1).Value, Cell2.Value
End If
Next Cell2
'-----------------------------------BK201's Notes-----------------------------------'
' The final phase is to get the next empty row in the output sheet and dump all the '
' key-value pairs from our second dictionary there. Since we're also iterating '
' through the keys of the first dictionary, the list will append properly to '
' accommodate each key's own dictionary. '
'-----------------------------------BK201's Notes-----------------------------------'
TColNRow = Sht3.Range("A" & Rows.Count).End(xlUp).Row + 1
Sht3.Range("A" & TColNRow).Resize(Dict2.Count, 1).Value = Application.Transpose(Dict2.Keys)
Sht3.Range("B" & TColNRow).Resize(Dict2.Count, 1).Value = Key
Sht3.Range("C" & TColNRow).Resize(Dict2.Count, 1).Value = Application.Transpose(Dict2.Items)
Next Key
Debug.Print Timer() - Start
End With
Application.ScreenUpdating = True
End Sub
运行代码后的结果:
第一个数字是删除的运行时间,第二个是转置的运行时间。考虑到我有50万个数据点,这还不错。排序数据在你的法庭上。
如果有帮助,请告诉我们。