Excel数据透视表和百分位数/四分位数/中位数

时间:2017-12-11 23:24:38

标签: excel vba excel-2010 pivot-table

在我的工作中,经常出现的一件事是需要对数据集执行中位数,四分位数和百分位数运算。我也被迫使用excel(不是我的选择),并且出于功能原因我也被迫使用数据透视表('dem slicers,hot damn)。

Excel(至少Excel 2010)在其数据透视表中没有此功能。一些附加组件,例如power pivot apparently add this,但不是每个人都有工作环境,您可以随时安装。还有其他一些值得注意的解决方法:

这些变通方法很棒,但是当我有超过10万行的数据时,它开始花费一些时间来处理我的烤面包机 - 而且我想要数据透视表的全部原因是我可以有一个光滑的切片机驱动的仪表板那很漂亮,反应灵敏。同样重要的是,当您获得高于65k +数据点时,使用Excel的条件和公式会破坏它,即使工作表可以达到1M行数据。

基本上,我想要一个强大而可靠的方法来计算数据透视表中给定选定变量的百分位数据,我希望它不会以冰川速度移动。我还希望它在数据子类别> 65K行时工作。

1 个答案:

答案 0 :(得分:1)

我的问题解决方案是用VBA编写的。这个答案将分为两部分:

  • 我正在做什么的概念性解释
  • 代码,我将在底部粘贴并移开。

概念

在(几乎)一个句子中:我的方法是数据透视表的目标,在其旁边创建一个适当大小的范围,并使用目标值列向下钻取数据透视表该子类别的数据 - 从这里百分位数据通过QUICKSELECT算法计算并粘贴到命名范围并清理/删除深入数据表。然后洗;冲洗;重复。

在多个句子中:

我们的主要功能( FindPivotPercentiles )将:目标数据透视表,值列的标题字符串作为输入,以计算值,以及可选的百分位数来计算。

我们通过调用 PivotEdgeRange 函数找到结果输出范围来计算我们计算的百分位数,该函数采用目标数据透视表(用于计算输出范围的垂直范围)和百分位数(使用计算范围的水平延伸 - 因此计算0.25,.5和.75百分位数的10类长度数据透视表将返回数据透视表右侧的3 x 10目标范围。

我们还将目标列设置为数据透视表中的最后一个值列,这仅用于深入查看数据,只要选择了适当的类别,所选的实际值无关紧要。

然后我们遍历数据列范围内的行,调用函数 PivotPercentiles 。该函数是实际向下钻入数据透视表的位,获取从调用函数提供的目标单元格,以及我们希望计算的实际值的标题字符串。向下钻取是使用targetCell.ShowDetail = True执行的,Excel创建的工作表由子程序 autoRenamePivotDetail 自动重命名(我想这不是绝对必要的,但我发现不是超显式关于这些东西后来叮咬我的屁股)。此钻取数据中的目标数据列从先前提供的标头字符串中选择,并循环以创建数组。

使用QUICKSELECT算法从数组( QuickSelect,QSPartition QSSwapElems )计算所需的百分位值,该算法类似QUICKSORT,但在确定时停止已找到第k个值。

返回数据透视表中该行的百分位数值数组后,不再需要向下钻取数据表并将其删除。百分位数返回到顶级调用函数,以粘贴到 FindPivotPercentiles 函数中确定的“结果范围”中,然后该过程将重复下一个可用的数据行,直到所有数据都用完为止。

最后,完成所有这些操作后,找到结果范围和实际数据透视表范围的并集,并给出一个名称,以便每次使用切片器更改显示的数据透视数据时,命名范围是动态更新的,因此任何依赖关系/图形也会更新。

输出看起来有点像这样: example of the output from running this subroutine

在此示例中,红色范围是向下钻取的目标列(但不一定是用于计算的实际目标数据),绿色是结果范围,蓝色是输出命名范围,供用户以后使用< / p>

限制

  • 只能计算一个目标值类型
  • 要求其他工作表名称具有合理的名称(即,没有其他形式的“工作表#”)
  • 您需要使用放置在ThisWorkbook模块中的Slicer / PivotTable更新事件,以便每次使用切片器时实际更改它。那里已经有一个tonne of info,所以我不会在这里复制任何内容。

代码

Sub FindPivotPercentiles(ByRef targetWorksheet As Worksheet, _
                         ByRef targetPivot As PivotTable, _
                         ByRef columnToCalc As String, _
                Optional ByRef percentInputs As Variant = -1)
'**********************************************************
'** Finds all Percentile Data for a given pivot table    **
'**********************************************************
Dim valueTitle As String                ' detail sheet column title to calc values from
Dim targetColumn As Range               ' Range of above column
Dim resultColumn As Range               ' output range for percentile values
Dim wholePivotRange As Range            ' selects entire pivot for naming range
Dim percentiles() As Single             ' desired percentile levels to calculate
Dim j As Long                           ' iterator
Dim k As Long                           ' iterator

' ------------------------------------------------------------------------------------
' User selectable things, choose the column to calculate percentiles for (eg "DAP")
' ------------------------------------------------------------------------------------
valueTitle = columnToCalc

' ------------------------------------------------------------------------------------
' Initalize the percentile array with default values if no user specified option present
' ------------------------------------------------------------------------------------
If IsArray(percentInputs) = False Then
    If percentInputs = -1 Then
        ReDim percentiles(1 To 7) As Single
            percentiles(1) = 0#
            percentiles(2) = 0.1
            percentiles(3) = 0.25
            percentiles(4) = 0.5
            percentiles(5) = 0.75
            percentiles(6) = 0.9
            percentiles(7) = 1#
    End If
Else
    ' ------ otherwise make them the input values
    ReDim percentiles(1 To UBound(percentInputs) - LBound(percentInputs) + 1) As Single
    For j = 1 To UBound(percentInputs) - LBound(percentInputs) + 1
        percentiles(j) = percentInputs(j - (1 - LBound(percentInputs)))
    Next j
End If

Set resultColumn = PivotEdgeRange(targetWorksheet, targetPivot, percentiles)
Set targetColumn = targetPivot.DataBodyRange.Resize(, 1)

' ------------------------------------------------------------------------------------
' Clean up work area, create some percentile headers and appropriately format then,
' then calculate percentile values and paste in area to right of pivot table
' ------------------------------------------------------------------------------------
resultColumn.Resize(, 100).EntireColumn.Clear
targetColumn.Resize(1).Offset(-1, 0).Copy
resultColumn.Resize(1).Offset(-1, 0).PasteSpecial xlPasteFormats
resultColumn.Resize(1).Offset(-1, 0) = percentiles

For j = 1 To targetColumn.Rows.count
    resultColumn.Rows(j) = PivotPercentiles(targetColumn.Rows(j), targetWorksheet, valueTitle, percentiles, True)
    Debug.Print "calculating row: " & j
Next j

' ------ format nicely, name range for later use
With resultColumn
    .NumberFormat = "0.00"
    Set resultColumn = .Resize(.Rows.count + 1).Offset(-1, 0)
    .Name = valueTitle & "_Percents"
End With

Set wholePivotRange = targetPivot.TableRange1
Set wholePivotRange = Union(wholePivotRange, resultColumn)
wholePivotRange.Name = valueTitle & "_WholeTable"

End Sub

PivotEdgeRange

Function PivotEdgeRange(ByRef targetWorksheet As Worksheet, _
                        ByRef targetPivot As PivotTable, _
               Optional ByRef percentInputs As Variant = -1) As Range
'*************************************************************************
'** For a given pivot table, set a range to the right most empty column **
'*************************************************************************
Dim numDataCols As Long                 ' number of value fields in table
Dim howManyOutputColumns As Integer     ' how many percentile calculation cols

If IsArray(percentInputs) = False Then
    If percentInputs = -1 Then: howManyOutputColumns = 7
Else
    howManyOutputColumns = UBound(percentInputs)
End If

numDataCols = targetPivot.DataBodyRange.Columns.count
Set PivotEdgeRange = targetPivot.DataBodyRange.Resize(, howManyOutputColumns).Offset(, numDataCols)

End Function

PivotPercentiles

Function PivotPercentiles(ByRef targetCell As Range, _
                          ByRef pivotCacheSheet As Worksheet, _
                          ByRef valueTitle As String, _
                 Optional ByRef percentInputs As Variant = -1, _
                 Optional ByRef suppressErrors As Boolean = False) As Double()
'************************************************************************
'** Finds the Percentile Data for a given grouping in a pivot table    **
'************************************************************************
Dim targetColumn As Range           ' targetColumn for calculation
Dim numberOfVals As Long            ' number of values, used to determine array size
Dim j As Long                       ' iterator
Dim badData As Boolean              ' flags that non-numeric data was in target column
Dim range2Array() As Double         ' coverts target range to array for faster calculation
Dim percentiles() As Single         ' desired percentile levels to calculate
Dim percentOutputs() As Double      ' holds calculated percentiles
Dim targetkth As Long               ' kth smallest value to extract from quickselect
' ------------------------------------------------------------------------------------
' Initalize the percentile array with default values if no user specified option
' present. This is already done in FindPivotPercentiles, so is mostly redundant, but
' I have left this in so you may have the option of using this as a stand alone function
' ------------------------------------------------------------------------------------
If IsArray(percentInputs) = False Then
    If percentInputs = -1 Then
        ReDim percentiles(1 To 7) As Single
            percentiles(1) = 0#
            percentiles(2) = 0.1
            percentiles(3) = 0.25
            percentiles(4) = 0.5
            percentiles(5) = 0.75
            percentiles(6) = 0.9
            percentiles(7) = 1#
    End If
Else
    ' ------ otherwise make them the input values
    ReDim percentiles(1 To UBound(percentInputs) - LBound(percentInputs) + 1) As Single
    For j = 1 To UBound(percentInputs) - LBound(percentInputs) + 1
        percentiles(j) = percentInputs(j - (1 - LBound(percentInputs)))
    Next j
End If

ReDim percentOutputs(1 To UBound(percentiles)) As Double
' ------------------------------------------------------------------------------------
' Show detail in target cell so we can compute the percentile data values. Show detail
' creates a new sheet with name "Sheet*" - we switch to this, and compute the medians
' in this space, before deleting it.
' ------------------------------------------------------------------------------------
targetCell.ShowDetail = True
Call autoRenamePivotDetail
Set pivotCacheSheet = ThisWorkbook.Sheets("PivotDetail")

Set targetColumn = pivotCacheSheet.ListObjects(1).ListColumns(valueTitle).DataBodyRange
numberOfVals = targetColumn.Rows.count

' ----- convert range to array, quicker computation time
ReDim range2Array(1 To numberOfVals) As Double
For j = 1 To targetColumn.Rows.count
    Select Case VarType(targetColumn(j))
        Case 2 To 4
            range2Array(j) = CDbl(targetColumn(j))
        Case 5
            range2Array(j) = targetColumn(j)
        Case 8
            range2Array(j) = CDbl(targetColumn(j))
        Case Else
            range2Array(j) = Empty
            badData = True
    End Select
Next j
' ------------------------------------------------------------------------------------
' Now we have an array to play with, loop over desired percentile values and calculate
' If arrays are no larger than 10, the .Percentile_Inc function can be used instead -
' this is neccessary as QuickSelect breaks when less than 3 array size.
' ------------------------------------------------------------------------------------
For j = 1 To UBound(percentiles)
    If UBound(range2Array) < 10 Then
        percentOutputs(j) = Application.WorksheetFunction.Percentile_Inc(range2Array, percentiles(j))
    Else
        targetkth = percentiles(j) * UBound(range2Array)
        If targetkth = 0 Then: targetkth = 1
        percentOutputs(j) = QuickSelect(range2Array, targetkth, 1, UBound(range2Array))
    End If
Next j

' ------ clean up
Application.DisplayAlerts = False
pivotCacheSheet.Delete
Application.DisplayAlerts = True

If badData And suppressErrors <> True Then
    MsgBox "Bad data (non-numeric) was found in the target range. Please ensure better cleaning of input data."
End If

' ----- output values to calling function
PivotPercentiles = percentOutputs
End Function

autoRenamePivotTable

Sub autoRenamePivotDetail()
'*****************************************************************************
'** Automatically rename the newly created detailed pivot data. This        **
'** relies on all worksheets otherwise having sensible names, it will ruin  **
'** your sheet names otherwise                                              **
'*****************************************************************************
Dim detailedPivotFound As Boolean       ' True if pivot detail sheet found
Dim wSheet As Worksheet                 ' Worksheet iterator
Dim renameSheet As Worksheet            ' Sheet to actually rename

detailedPivotFound = False
If CheckPresent("PivotDetail", "Worksheet") Then
    Err.Raise Number:=2000, _
    Source:="AlreadyPresent", _
    Description:="Cannot rename worksheet; 'PivotDetail sheet already present"
End If

' ------ cycle through, change name to pivotsheet
For Each wSheet In ThisWorkbook.Worksheets
    If wSheet.Name Like "Sheet*" And detailedPivotFound = True Then
        Err.Raise Number:=2000, _
        Source:="AlreadyPresent", _
        Description:="Cannot expand pivot detail; spare 'sheet' name already present."
    End If
    If wSheet.Name Like "Sheet*" Then
        detailedPivotFound = True
        Set renameSheet = wSheet
    End If
Next wSheet
renameSheet.Name = "PivotDetail"

' ------------------------------------------------------------------------------------
' Error handling for cases which do not have a CheckPresent datatype defined for them
' as of yet, or if multiple "sheet*" names are found
' ------------------------------------------------------------------------------------
AlreadyPresent_End:
    Exit Sub
AlreadyPresent_Err:
    MsgBox Prompt:="Error number " & Err.Number & " was raised. " & _
        vbCrLf & "Source: " & Err.Source & vbCrLf & _
        "Description: " & Err.Description
    Resume AlreadyPresent_End
End Sub

所有QuickSelect内容:

Function QuickSelect(list() As Double, k As Long, startInterval As Long, endInterval As Long) As Double
'**********************************************************************
'** Uses a partial form of QUICKSELECT to find k'th percentile value **
'**********************************************************************
Dim pivotInterval As Long                       ' where the current pivotpoint is
Dim splitInterval As Long                       ' where to split the interval
Dim notFinished As Boolean: notFinished = True  ' flips to true once kth val found
' ------------------------------------------------------------------------------------
' Adapted from:
' https://stackoverflow.com/questions/3779763/fast-algorithm-for-computing-percentiles-to-remove-outliers
' More information on the QuickSelect algorithm: https://en.wikipedia.org/wiki/Quickselect
' ------------------------------------------------------------------------------------
While notFinished
    pivotInterval = CLng(Floor(startInterval + endInterval) / 2)

    ' ------ make sure min and max cases handled properly
    If k = 1 Then pivotInterval = CLng(Floor(startInterval + endInterval) / 2)
    If k = endInterval Then pivotInterval = CLng(Ceiling(startInterval + endInterval) / 2)

    splitInterval = QSPartition(list, startInterval, endInterval, pivotInterval)

    If k < splitInterval Then
        endInterval = splitInterval
    ElseIf k > splitInterval Then
        startInterval = splitInterval + 1
    Else
        QuickSelect = list(k)
        notFinished = False
    End If

    ' ------ break loop for max case.
    If k = startInterval And k = endInterval Then
        QuickSelect = list(k)
        notFinished = False
    End If
Wend
End Function

Function QSPartition(list() As Double, startInterval As Long, endInterval As Long, _
                    pivotInterval As Long) As Long
'**********************************************************
'** Swaps higher and lower elements within the partition **
'**********************************************************
Dim pivotValue As Double        ' the value of the pivot point
Dim newPivot As Double          ' the new pivot point after partitioning
Dim storeInterval As Double     ' temp value to help swapping pivot value
Dim i As Double                 ' iterator

pivotValue = list(pivotInterval)
list(pivotInterval) = list(startInterval)
list(startInterval) = pivotValue

storeInterval = startInterval + 1
While (storeInterval < endInterval) And list(storeInterval) <= pivotValue
    storeInterval = storeInterval + 1
Wend

For i = storeInterval + 1 To endInterval
    If list(i) <= pivotValue Then
        Call QSSwapElems(list, i, storeInterval)
        storeInterval = storeInterval + 1
    End If
Next i

newPivot = storeInterval - 1
list(startInterval) = list(newPivot)
list(newPivot) = pivotValue

QSPartition = newPivot
End Function

Sub QSSwapElems(ByRef list() As Double, ByVal i As Long, ByVal j As Long)
'************************
'** Swap list elements **
'************************
Dim temp As Double

temp = list(i)
list(i) = list(j)
list(j) = temp
End Sub

其他需要的其他代码

Function CheckPresent(checkName As String, checkType As String) As Boolean
'*******************************************************************
'** Checks that <checkname> is present in object type <checktype> **
'*******************************************************************
Dim index As Long                   ' iterator

CheckPresent = False
index = 1

Select Case checkType
    Case "Connections"
        ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        ' Find whether a given named Connection exists
        ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        Dim conName As Connections
        Dim conParent As Variant

        Set conParent = ThisWorkbook.Connections
        While (CheckPresent <> True And index <= conParent.count)
            If conParent(index).Name = checkName Then: CheckPresent = True
            index = index + 1
        Wend

    Case "Worksheet"
        ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        ' Find whether a given named worksheet exists
        ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        Dim wName As Worksheet
        Dim sheetParents As Sheets

        Set sheetParents = ThisWorkbook.Worksheets
        While (CheckPresent <> True And index <= sheetParents.count)
            If sheetParents(index).Name = checkName Then: CheckPresent = True
            index = index + 1
        Wend

        ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        ' handle cases where no data type exists,
        ' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    Case Else
        Err.Raise Number:=vbObjectError + 1001, _
        Source:="TypeNotPresent", _
        Description:="This function does not handle variable type: " & checkType & _
                     ". Please create a case to handle it in the CheckPresent function."

End Select

' ------------------------------------------------------------------------------------
' Error handling for cases which do not have a CheckPresent datatype defined for them
' as of yet.
' ------------------------------------------------------------------------------------
TypeNotPresent_End:
    Exit Function

TypeNotPresent_Err:
    MsgBox Prompt:="Error number " & Err.Number & " was raised. " & _
        vbCrLf & "Source: " & Err.Source & vbCrLf & _
        "Description: " & Err.Description
    Resume TypeNotPresent_End


End Function