VB.net直方图 - 如何bin数据

时间:2015-03-03 12:25:52

标签: vb.net statistics histogram binning

我正在研究直方图课程,特别是分组方法。

就此而言,我有两个问题:

  1. 从逻辑/统计角度看,它是一种正确/合适的算法

  2. 代码是最佳的还是至少是体面的 - 请告诉我如何改进它

  3. 任何帮助都非常感谢 - 提前thx。

    到目前为止,这是我的代码......

    Public Class Histo
    Dim data() As Double
    Dim bins As Integer = 0
    Dim bw As Double = 0
    Dim _min As Double = 0
    Dim _max As Double = 0
    Dim arrMax As Double = 0
    Dim cht As Chart
    Public shared Decimals As Integer
    
    Public Sub New(_arr() As Double, _cht As Chart)
        'One-dimensional array as data
        data = _arr
    
        'No of bins with Sturges method
        bins  = NoBin_ST(data)
    
        'calculate bin width
        bw = Range(data) / bins
    
        'bin boundries for first bin 
        _min = Min(data)
        _max = _min + bw
    
        'max of data
        arrMax = Max(data)
    
        'chart object
        cht = _cht
    
        'no of decimals on x-axis
        Decimals = Dec
    End Sub
    
    Public Function Binning() As Integer()
        'Binning "algorihtm" for continuous data
        '
        'RETURN: one-dimensional array with n bins
        '
        Array.Sort(data)
        Dim j As Integer = 0
        Dim mn As Double = _min
        Dim mx As Double = _max
        Dim counter(bins-1) As Integer
    
        For i As Integer = 0 To data.GetLength(0)-1
            'check if data point is within the boundries of the current bin     
            If data(i) >= mn AndAlso data(i) < mx Then
                'add counter in current bin
                counter(j) += 1
            Else
                'special case: at the end at least one data point will equal max of the last bin
                ' and must be counted in that bin
                If data(i) = arrMax  Then
                    counter(j) += 1
                    Continue For
                End If
                'the data point has exceeded the boundries of the previous bin 
                ' and must be counted in the next bin
                'min and max is increased with the bin width
                mn += bw
                mx += bw
                'go to next bin
                j += 1
                'count data point in this bin and loop again
                counter(j) += 1
            End If
        Next
        Return counter
    End Function
    
    .....
    

1 个答案:

答案 0 :(得分:0)

不知道这是否再有表现,但我认为它要简单一些。

Function CreateBins(values As IEnumerable(Of Double), numberOfBins As Integer) As IGrouping(Of Integer, Double)()
        If values Is Nothing Then Throw New Exception("Values cannot be null")
        If values.Distinct.Count < 2 Then Throw New Exception("Values must contain at least two ditinct elements")
        If numberOfBins < 1 Then Throw New Exception("numberOfBins must be an integer > 1")

        Dim min = values.Min
        Dim max = values.Max
        Dim binSize = (max - min) / numberOfBins
        ' Checking for two distinct elements should eliminate possibility of min=max and therefore binsize=0

        Dim bins = values.GroupBy(Function(x) Convert.ToInt32(Math.Floor((x - min) / binSize))).ToArray

        ' Group counts available using the ienumerable Count function
        ' Dim counts = bins.Select(Function(x) x.Count)
        ' Or retaining the group key
        ' Dim counts = bins.Select(Function(x) New With {Key x.Key, x.Count})

        Return bins
End Function

每个垃圾箱现在都是一个群组。原始值将保留为组的一部分,以便进行潜在的后续分析。使用组函数Count()可以进行计数