在Variant 2D数组中收集和小计重复的行

时间:2015-04-29 10:52:27

标签: excel-vba multidimensional-array vba excel

我在Excel中有一个数据集,包含20列和不同数量的行,范围从20,000到50,000。

每一行都是一个项目集合,其中一列表示集合中的项目数量,另一列表示集合的总组合权重。 其中一些行在所有列中完全相同,而有些行在数量和重量上完全相同。

我想创建一个运行数据集的宏和#34;堆栈"在数量和重量之外的所有其他参数上重复的行,并将这两个值相加。

换句话说,一个转换它的宏:

|Param1|Param2|...|Param18|Quantity|Weight|
| A    | 1    |...| C     | 5      | 12.5 |
| A    | 1    |...| C     | 2      |  5.0 |
| A    | 1    |...| C     | 3      |  7.5 |
| B    | 2    |...| C     | 1      |  2.3 |
| B    | 2    |...| C     | 2      |  4.6 |

对此:

|Param1|Param2|...|Param18|Quantity|Weight|
| A    | 1    |...| C     | 10     | 25.0 |
| B    | 2    |...| C     | 3      |  6.9 |

我知道这可以在一个简单的数据透视表中完成,但由于种种原因,在这种情况下这是不可行的。

由于我处理的是大型数据集,因此我希望将其全部加载到内存中,而不是逐行读取和写入以加快性能(正如本文中的#13中提示的那样{{ 3}})。但是,我对如何对存储在内存中的数据进行行操作感到困惑。

到目前为止,我的代码看起来像这样:

Dim r, c, LastRow As Integer
Dim temp_range As Variant

LastRow = Cells(65536, 2).End(xlUp).Row

'Load the data set into memory
temp_range = Sheets("1.1").Range(Sheets("1.1").Cells(2, 1), Sheets("1.1").Cells(LastRow, 20)).Value

'Run through the data set from bottom to top and bulk identical rows together
For r = UBound(temp_range) To LBound(temp_range)
    For i = r - 1 To LBound(temp_range)

        'PSEUDO CODE START HERE
        If row temp_range(r) = row temp_range(i) Then
            temp_range(i,19) = temp_range(r,19) + temp_range(i,19)
            temp_range(i,20) = temp_range(r,19) + temp_range(i,20)
            Delete row temp_range(r)
            Exit For
        End if

        'PSEUDO CODE END HERE

    Next i
Next r

我被困在代码中的高代码伪代码部分。我根本不知道如何比较行,将数量和重量从一行比较到另一行,然后删除变量中的重复行,并保留内存中的范围。

3 个答案:

答案 0 :(得分:0)

在Excel中使用Microsoft Query(SQL):

此类数据操作操作非常适合SQL查询。无需使用VBA逐行遍历数据:

SELECT S1.Param1, S1.Param2, S1.Param18, SUM(S1.Quantity), SUM(S1.Weight) 
FROM [Sheet1$] AS S1 GROUP BY Param1, Param2,Param18

保证通过OLE DB 快速有效地运行。无论VBA代码的效率都会低得多。

要刷新查询,只需从VBA运行以下代码:

Set ws = ActiveSheet
ws.QueryTables(1).Refresh BackgroundQuery:=False

ws 是您找到查询表的工作表。

使用数据 - > Microsoft Query 来自其他来源 - >来自Microsoft Query ,或者随意使用我的加载项:{{3} }

答案 1 :(得分:0)

我发现这篇文章(http://sitestory.dk/excel_vba/arrays-and-ranges.htm,转到页面中间)介绍了如何删除我构建了一些代码的相同行。它没有100%解决我原来的问题,因为它没有比较数组中的整行,而是每行分别对应每一列,但事实证明它具有相当好的性能。

以下是代码:

'I couldn't get the final step of pasting the output array into the new sheet working properly without declaring this option, otherwise the data would be pasted one cell to the right and below where I wanted it.
Option Base 1

Dim r, i, c, LastRow, DeletedRows As Integer
Dim input_array, output_array As Variant
Dim identical As Boolean
Dim s As Worksheet
Dim NewRange As Range

LastRow = Cells(65536, 2).End(xlUp).Row

'Load the data set into memory, consisting of 20 columns of data and a 21th column with no data that is used for marking rows for deletion.
input_array = Sheets("1.1").Range(Sheets("1.1").Cells(2, 1), Sheets("1.1").Cells(LastRow, 21)).Value

DeletedRows = 0

'Run through the data set from bottom to top comparing rows one at a time, copy Quantity and Weight values and mark rows for deletion
For r = UBound(input_array) To 2 Step -1
    For i = r - 1 To 2 Step -1

        'Assume row r and i are identical
        identical = True

        'Run through columns of r and i, flag if non-identical value is found and stop the loop (col 18 is ignored, since this is the "Quantity" column, while col 20 is the "Weight" column)
        For c = 1 To 18
            If input_array(r, c) <> input_array(i, c) And c <> 18 Then
                identical = False
                Exit For
            End If
        Next c

        ' If no non-identical columns were found, add "Quantity" and "Weight" from row r to row i and mark row r for deletion
        If identical Then
            input_array(i, 18) = input_array(i, 18) + input_array(r, 18)
            input_array(i, 20) = input_array(i, 20) + input_array(r, 20)
            input_array(r, 21) = "_DELETE_"
            DeletedRows = DeletedRows + 1
            Exit For
        End If

    Next i
Next r

' Resize the new array to the size of the old array minus the number of deleted rows
ReDim output_array(UBound(input_array) - DeletedRows, 20)

' Copy rows not marked as deleted from old array to new array
i = 1
For r = 1 To UBound(input_array)
    If input_array(r, 21) <> "_DELETE_" Then
        For c = 1 To 20
            output_array(i, c) = input_array(r, c)
        Next c
        i = i + 1
    End If
Next r

' Create new sheet and 
Set s = Sheets.Add
Set NewRange = s.Range("A2").Resize(UBound(output_array), 20)
NewRange = output_array

在我的计算机上,宏需要大约30秒才能将20,000行减少到3,000行,考虑到大量数据,我认为这非常好。

答案 2 :(得分:0)

这可以以毫秒为单位,而不是秒。我也在使用数组和字典对象。但是以更简单,更周到的实现方式展示。它比对表使用sql更快。它可以比较任意数量的列,只需确保它们作为串联字符串包含在KeyIn变量中即可。我还通过假设求和的值在第4列中来简化该函数。您可以在其他列中为多个值调整代码。我正在从1个数组写入另一个数组(从InAy到OutAy),字典确定行是否已存在。魔术发生在字典的Item属性中。当写入新的OutAy行时,我将item属性值分配给行(r)。然后,当它已经存在时,我使用项目键检索将其写入OutAy的行(r):d.item(KeyIn)然后可以用现有值的总和更新OutAy(r,4)中的该值,并新值“ KeyVal”。

这解决了与SQL查询聚合相同的问题:“从数据组中按a,b,c选择a,b,c,sum(d)”

注意:添加一个工具->对Microsoft脚本运行时的引用

    sub somesub()
     ...
     data = Range("WhereYourDataIs") 'create data array
     Range("WhereYourDataIs").clear 'assumes you'll output to same location
     data = RemoveDupes(data) 'removedupes and sum values
     Range("A2").Resize(UBound(data), UBound(data, 2)) = data 'A2 assumes your data headers begin in row 1, column 1
     ...
    End Sub

Function RemoveDupes(InAy As Variant) As Variant
    Dim d As Scripting.Dictionary
    Set d = New Scripting.Dictionary
    ReDim OutAy(1 To UBound(InAy), 1 To 4)
    r = 1

    For i = 1 To UBound(InAy)
        KeyIn = ""
        KeyVal = InAy(i, 4) 'the value field to sum/aggregate if exists
        For c = 1 To 3 'a, b, c metadata to roll up
            KeyIn = KeyIn & InAy(i, c)
        Next c
        If d.Exists(KeyIn) Then
            OutAy(d.item(KeyIn), 4) = OutAy(d.item(KeyIn), 4) + KeyVal 'd.item(KeyIn) is r, set when OutAy row was added. Same as OutAy(r,4)=OutAy(r,4) + KeyVal 
            Else:
            d.Add KeyIn, r 'r is set as the item value referencing the row of the OutAy when it was first added. The reference is used when .Exists is true
            For c = 1 To 4
                OutAy(r, c) = InAy(i, c)
            Next c
            r = r + 1
        End If
    Next
    RemoveDupes = OutAy
End Function