估计Excel中行之间的重复百分比

时间:2015-10-02 23:25:50

标签: excel vba duplicates similarity

我有一个包含200多个变量(列)的Excel(2010)数据文件,以及超过1,000条记录(行),每个记录都标有唯一的ID号。但是,我怀疑这些记录中的一些是捏造的,即有人拿现有记录,复制它,只改变了几个数字,使它有点不同。因此,我需要生成一个矩阵,向我显示相同值的数量/百分比"在每个记录和所有其他记录之间(例如,记录1和记录2共享75个相等的值,记录1和记录3共享57个相等的值,记录2和记录3共享45个相等的值等)。我有一些解决方法,但它们需要数小时而且不会产生简单的矩阵。我不关心价值观之间的区别 - 只是它们是否相等。任何想法将不胜感激!

2 个答案:

答案 0 :(得分:1)

不知道这对大型数据集的表现如何,但是:

Sub T()

    Dim d, m(), nR As Long, nC As Long, r As Long, r2 As Long, c As Long
    Dim v1, v2, i As Long
    d = Sheet1.Range("A1").CurrentRegion.Value
    nR = UBound(d, 1)
    nC = UBound(d, 2)
    ReDim m(1 To nR, 1 To nR)

    For r = 1 To nR
        For r2 = r To nR
            i = 0
            For c = 1 To nC
                v1 = d(r, c): If IsError(v1) Then v1 = "Error!"
                v2 = d(r2, c): If IsError(v2) Then v2 = "Error!"
                If v1 = v2 Then i = i + 1
            Next c
            m(r2, r) = i
        Next r2
    Next r

    With Sheet2
        .Range("B2").Resize(nR, nR).Value = m
        'assuming your id's are in the first column...
        For r = 1 To nR
            .Cells(1 + r, 1) = d(r, 1)
            .Cells(r, r + 1) = d(r, 1)
        Next r
    End With

End Sub

答案 1 :(得分:0)

我对HTML有点沉闷并发帖......不是程序员,所以请宽恕一切......

Sub CalculateDuplicationBetweenRecords()

Dim myCases As Long
Dim myVariables As Long
Dim myCurrentCase As Long
Dim myComparisonCase As Long
Dim myCurrentVariable As Long
Dim myCurrentCell As Long
Dim myComparisonCell As Long
Dim myCounter As Long

'   Would be nice to automate number of cases and variables...
myCases = 88
myVariables = 81
'   Insert case #1 id for results matrix - cosmetic...
Worksheets("Sheet2").Cells(1, 2).Value = Worksheets("Sheet1").Cells(1, 1).Value

For myCurrentCase = 1 To myCases - 1
    For myComparisonCase = myCurrentCase + 1 To myCases
        myCounter = 0
        For myCurrentVariable = 1 To myVariables
            myCurrentCell = Worksheets("Sheet1").Cells(myCurrentCase, myCurrentVariable).Value: If IsError(myCurrentCell) Then myCurrentCell = "Error!"
            myComparisonCell = Worksheets("Sheet1").Cells(myComparisonCase, myCurrentVariable).Value: If IsError(myComparisonCell) Then myComparisonCell = "Error!"
            If myCurrentCell = myComparisonCell Then myCounter = myCounter + 1
        Next myCurrentVariable
        Worksheets("Sheet2").Cells(myCurrentCase + 1, 1).Value = Worksheets("Sheet1").Cells(myCurrentCase, 1).Value
        Worksheets("Sheet2").Cells(1, myComparisonCase + 1).Value = Worksheets("Sheet1").Cells(myComparisonCase, 1).Value
        Worksheets("Sheet2").Cells(myCurrentCase + 1, myComparisonCase + 1).Value = myCounter
    Next myComparisonCase
Next myCurrentCase

End Sub