寻求有效的算法来分析类似于VBA中的数据透视表的数据

时间:2015-03-02 20:29:59

标签: algorithm vba excel-vba dictionary excel

简介

我已经有了解决这个问题所描述问题的有效方法。作为编程的新手,我认为我的解决方案相当丑陋,但是:1)我不确定这是真的,2)在互联网学习的一周阅读后,我没有看到更优雅的解决方案。由于我可以使用数据透视表在临时基础上轻松解决问题,并且主题电子表格是常见的报表样式,我认为其他许多人已经解决了这个问题(但我还没有找到或知道要搜索的关键字) )。我将提供原始数据,所需的输出,我目前工作的代码,以及我看过的一些替代方案,但没有使用,为什么。我要求你提供一个更好的编程方法 - 如果你看到一个 - 或者至少验证我的方法是合理有效的。谢谢。

原始数据

以下是包含要汇总的数据的报告示例:

Report #|    Assignee|    Type of Report|    Department|    Status
1       |     Shannon|            Ad hoc|    Accounting|    Declined
2       |     Shannon|            Ad hoc|    Accounting|    Completed
3       |     Shannon|            Change|    Accounting|    New
4       |     Shannon|            Change|      Shipping|    In Progress
5       |     Shannon|          Training|      Shipping|    Declined
6       |     Shannon|          Training|   CustService|    Completed
7       |     Shannon|          Training|   CustService|    New
8       |       Jason|            Ad hoc|   CustService|    In Progress
9       |       Jason|            Change|   CustService|    Declined
10      |       Jason|            Ad hoc|    Accounting|    Completed
11      |       Jason|          Training|    Accounting|    New
12      |      Thomas|          Training|    Accounting|    In Progress
13      |       Jason|            Change|      Shipping|    Declined
14      |       Jason|            Ad hoc|      Shipping|    Completed
15      |      Thomas|            Ad hoc|   CustService|    New
16      |       Jason|               New|   CustService|    In Progress
17      |      Thomas|               New|   CustService|    Declined
18      |      Thomas|            Change|   CustService|    Completed
19      |      Thomas|            Ad hoc|      Shipping|    New
20      |      Thomas|            Change|      Shipping|    In Progress
     -Continues in similar fashion until:-
545     |     Phyllis|               New|    Accounting|    Declined

所需输出

数据需要以类似于以下的方式进行总结。也就是说,它按受让人名称过滤,子类别的计数是每个父类别的输出。 (注意:使用数据透视表ad hoc可以很容易地获得此输出,但我想将其放入一个运行表中,该表以编程方式构建,以进行趋势分析。)

Shannon:    Type of Report       Department         Status
            Ad hoc= 25           Accounting= 45     Declined = 12
            Change= 13           CustService= 2     In Progress= 24
            Training= 3          Shipping= 75       New= 56
            New= 81                                 Completed= 30

Jason:      Type of Report       Department         Status
            Ad hoc= 12           Accounting= 21     Declined = 0
            Change= 3            CustService= 23    In Progress= 12
            Training= 20         Shipping= 4        New= 12
            New= 13                                 Completed= 24

-Continues for each "Assignee"-

我所做的总结与您从数据透视表中获得的内容非常相似,而这正是手动过程获取数据的方式。但是,我需要获取数据并将其放入每日跟踪表中,该表保留历史记录并用于每个受让人的趋势分析,因此数据透视表本身不是解决方案。

当前正在运行的代码

这是整个工作代码,包括两个被调用的子代,它们被附加到末尾:

Sub CollateData()

Dim HdrNm As New Collection 'Collection used to read and reference column indices.

'Variables used for referencing the "Assigned To" column
Dim Assignee As New Scripting.Dictionary
Dim nmAssignee As New Scripting.Dictionary
Dim Asgn As String
Dim a As Integer
Dim aKey As Variant

'Variables used for referencing the "Type of Report" column
Dim TypRep As New Scripting.Dictionary
Dim nmTypRep As New Scripting.Dictionary
Dim arrTypRep() As Integer
Dim Typ As String
Dim t As Integer

'Variables used for referencing the "Department" column
Dim Dept As New Scripting.Dictionary
Dim nmDept As New Scripting.Dictionary
Dim arrDept() As Integer
Dim Bus As String
Dim b As Integer

'Variables used for referencing the "Task Status" column
Dim TskStatus As New Scripting.Dictionary
Dim nmTskStatus As New Scripting.Dictionary
Dim arrTskStatus() As Integer
Dim Tsk As String
Dim s As Integer

'Other variables
Dim DataWS As Worksheet
Dim ScratchWS As Worksheet
Dim lastrow As Integer, x As Integer

Set DataWS = ThisWorkbook.Worksheets("SheetWithRawData")
lastrow = DataWS.Cells(Rows.Count, 11).End(xlUp).Row

Call ReadHeaderRow(DataWS, HdrNm) 'Fills the HdrNm collection with column index using column headers for keys

'Initialize variables for the loop that follows
a = 1
t = 1
b = 1
s = 1

'This next seciont/first loop goes through the report to identify a unique list of assignees and category lists _
' which need to be summed. These lists will be used to ReDim the 2-dimensional arrays to appropriate _
' size, as well as reference the elements of the 2D array(s).
'
' NOTE: I am using the seemingly duplicative Dictionaries (e.g. TypRep & nmTypeRep) in order to have _
' access to the category as both a string and as an integer/index.

For x = 2 To lastrow

    If Not Assignee.Exists(DataWS.Cells(x, HdrNm("Assigned to")).Value) Then
        Assignee.Add DataWS.Cells(x, HdrNm("Assigned to")).Value, a
        nmAssignee.Add a, DataWS.Cells(x, HdrNm("Assigned to")).Value
        a = a + 1
    End If

    If Not TypRep.Exists(DataWS.Cells(x, HdrNm("Type of Report")).Value) Then
        TypRep.Add DataWS.Cells(x, HdrNm("Type of Report")).Value, t
        nmTypRep.Add t, DataWS.Cells(x, HdrNm("Type of Report")).Value
        t = t + 1
    End If

    If Not Dept.Exists(DataWS.Cells(x, HdrNm("Department")).Value) Then
        Dept.Add DataWS.Cells(x, HdrNm("Department")).Value, b
        nmDept.Add b, DataWS.Cells(x, HdrNm("Department")).Value
        b = b + 1
    End If

    If Not TskStatus.Exists(DataWS.Cells(x, HdrNm("Task Status")).Value) Then
        TskStatus.Add DataWS.Cells(x, HdrNm("Task Status")).Value, s
        nmTskStatus.Add s, DataWS.Cells(x, HdrNm("Task Status")).Value
        s = s + 1
    End If

Next x

'Assign the appropriate dimensions to the following 2D arrays
ReDim arrTypRep(1 To Assignee.Count, 1 To TypRep.Count)
ReDim arrDept(1 To Assignee.Count, 1 To Dept.Count)
ReDim arrTskStatus(1 To Assignee.Count, 1 To TskStatus.Count)

'The following, second loop now goes through and sums up the count of each category element for each _
' Assignee.  Using this technique, I only go through the list/report once (or twice, if you consider _
' the previous loop to dimension the arrays) in order to tabulate the desired data.

For x = 2 To lastrow

    Asgn = DataWS.Cells(x, HdrNm("Assigned to")).Value
    Typ = DataWS.Cells(x, HdrNm("Type of Report")).Value
    Bus = DataWS.Cells(x, HdrNm("Department")).Value
    Tsk = DataWS.Cells(x, HdrNm("Task Status")).Value
    arrTypRep(Assignee.item(Asgn), TypRep.item(Typ)) = arrTypRep(Assignee.item(Asgn), TypRep.item(Typ)) + 1
    arrDept(Assignee.item(Asgn), Dept.item(Bus)) = arrDept(Assignee.item(Asgn), Dept.item(Bus)) + 1
    arrTskStatus(Assignee.item(Asgn), TskStatus.item(Tsk)) = arrTskStatus(Assignee.item(Asgn), TskStatus.item(Tsk)) + 1

Next x

'Now to generate the output of the data we collected:
On Error Resume Next
Application.DisplayAlerts = False
With ThisWorkbook
    .Worksheets("DesiredOutput").Delete
    .Worksheets.Add after:=.Worksheets(1)
End With
Application.DisplayAlerts = True
On Error GoTo 0

Set ScratchWS = ThisWorkbook.ActiveSheet
ScratchWS.Name = "DesiredOutput"

x = 1

'Loop through each Assignee and dump out the collected counts
For Each aKey In Assignee

    Call OutputData("Type of Report", Assignee, nmAssignee, aKey, TypRep, nmTypRep, arrTypRep, x)
    Call OutputData("Department", Assignee, nmAssignee, aKey, Dept, nmDept, arrDept, x)
    Call OutputData("Task Status", Assignee, nmAssignee, aKey, TskStatus, nmTskStatus, arrTskStatus, x)

Next aKey

Range("B1").ColumnWidth = 3
Range("A1, C1").EntireColumn.AutoFit

End Sub

****************************************************************************
****************************************************************************

Sub OutputData(Title As String, Assignee As Scripting.Dictionary, nmAssignee As Scripting.Dictionary, _
    aKey As Variant, ReportCategory As Scripting.Dictionary, nmReportCategory As Scripting.Dictionary, _
    arrCategory() As Integer, x As Integer)

Dim CatKey As Variant

With Cells(x, 2)
   .Value = Title
   .Font.Bold = True
End With

x = x + 1

For Each CatKey In ReportCategory
    Cells(x, 1).Value = nmAssignee.item(Assignee.item(aKey))
    Cells(x, 3).Value = nmReportCategory.item(ReportCategory.item(CatKey))
    Cells(x, 4).Value = arrCategory(Assignee.item(aKey), ReportCategory.item(CatKey))
    x = x + 1
Next CatKey

x = x + 1

End Sub

**************************************************************************
**************************************************************************

Private Sub ReadHeaderRow(TargetWS As Worksheet, HdrNm As Collection)

Dim lastcolumn As Integer
Dim x As Integer

lastcolumn = TargetWS.Cells(1, Columns.Count).End(xlToLeft).Column

For x = 1 To lastcolumn
    HdrNm.Add TargetWS.Cells(1, x).Column, TargetWS.Cells(1, x).Value
Next x

End Sub

考虑/放弃其他可能的解决方案

循环每个受让人的名单

我考虑过生成一个字典/受让人集合,然后在每次传递中循环收集每个受让人的数据报告,但是受让人的数量可能会改变(上升),报告列表可能会上升,因此许多无关的通过列表。

Jagged Arrays / Collections / Dictionaries

当我第一次知道我可以做字典词典(数组数组等)时,我很兴奋,但据我所知,我不能使用第一个字典作为受让人(第一维)和第二个字典对于所有一个类别(例如报告类型)作为第二个维度。我实际上需要为每个受让人和类别创建一个单独的字典。换句话说,如果我只有一个类别(报告类型)和15个受让人,我实际上需要创建16个词典:第一个字典是“受让人” - 受让人的名字作为密钥,第2到第16个词典( TypRep1到TypRep15)作为与字典受让人中的键对应的项目。另外,我无法动态创建字典,因为受让人的数量可能会发生变化,所以这种方法适合我,除非我误解了一些重要的事情(总是可能的)。我对锯齿状数据类型的了解来自:http://bytecomb.com/collections-of-collections-in-vba/

自定义数据类型

我没有尝试过这个,因为我刚刚遇到过它,我对此并不了解,但也许这个问题可以作为自定义数据类型来解决。我将更多地阅读它们,但也许这是一个我还不了解的更好的解决方案。

结论声明

我知道这很难读,对不起。感谢您坚持这一点。我非常感谢有关如何实现上述代码实现的任何建议。我有信心我没有找到更好的方法的原因是这个问题的解决方案对我们每个人来说都是显而易见的,似乎任何在VBA / Excel编码的人都会遇到一些频率。感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

我们在编程中使用对象。大多数计算机都有Excel,所以你可以为你做得很好。

Windows带有一个对象,我认为它是一种数据类型。在内存中创建的,断开连接的记录集。

所以你会这样做

  rs.filter = "Assignee='Shannon' AND Status='Cancelled'"

然后

 msgbox rs.recordcount

给你这个号码。

或者您可以排序和枚举。


这会从文件的顶部或底部剪切线条。

cscript scriptname.vbs "" t x 5 <infile.txt >outfile.txt



Set Arg = WScript.Arguments
set WshShell = createObject("Wscript.Shell")
Set Inp = WScript.Stdin
Set Outp = Wscript.Stdout

    Set rs = CreateObject("ADODB.Recordset")
    With rs
        .Fields.Append "LineNumber", 4 

        .Fields.Append "Txt", 201, 5000 
        .Open
        LineCount = 0
        Do Until Inp.AtEndOfStream
            LineCount = LineCount + 1
            .AddNew
            .Fields("LineNumber").value = LineCount
            .Fields("Txt").value = Inp.readline
            .UpDate
        Loop

        .Sort = "LineNumber ASC"

        If LCase(Arg(1)) = "t" then
            If LCase(Arg(2)) = "i" then
                .filter = "LineNumber < " & LCase(Arg(3)) + 1
            ElseIf LCase(Arg(2)) = "x" then
                .filter = "LineNumber > " & LCase(Arg(3))
            End If
        ElseIf LCase(Arg(1)) = "b" then
            If LCase(Arg(2)) = "i" then
                .filter = "LineNumber > " & LineCount - LCase(Arg(3))
            ElseIf LCase(Arg(2)) = "x" then
                .filter = "LineNumber < " & LineCount - LCase(Arg(3)) + 1
            End If
        End If

        Do While not .EOF
            Outp.writeline .Fields("Txt").Value

            .MoveNext
        Loop
    End With

一种方法是将4个标准放入4个阵列 - 受让人,状态,部门,另一个。

For each a in assignee()
    For each b in Status()
        For each c in Dept()
            For each d in other()
                .filter = "assinnee=" & a & "AND Status=" & b & "And dept=" & c
                msgbox .recordcount
            Next
        Next
    Next
Next