查找没有按特定顺序重复的行,并显示按重复次数排序的结果

时间:2014-07-31 22:33:02

标签: excel excel-vba excel-formula vlookup worksheet-function vba

我能够使用Python解决这个问题,但我还需要在Excel中实现解决方案,以便我可以使用图形轻松地表示结果。

鉴于此表:

b   a   c
c   a   b 
a   c   b
a   c
a   c   d
b   c   a
d   c   a

我想获得一个列表,该列表按表中重复行(无特定顺序)的次数排序。

  • 因此,这将被视为重复的行:" a b c"," c b a"," c b"
  • 但这不会:" ab c"," b c"," b"," a b", " a c"

因此,我正在寻找的输出将是:

1st place: "b+a+c" found 4 times
2nd place: "a+c+d" found 2 twice
3rd place: "a+c" found once

输出必须说" b + a + c"即使它也在计算" a + b + c"," c + b + a"等等...因为" b + a + c"是所有其他后续重复中的第一个。

有人能告诉我解决问题的正确方法吗?

6 个答案:

答案 0 :(得分:3)

我建议你另外一种方法来解决这个问题。

您可以将b c d转换为1 2 4 8(二进制为01 10 100 1000)。

a+b+c = a+c+b =... = 7 (111)
a+c = c+a = 5 (101)

因此,您可以使用sum值在excel中进行分组。

将单个字符转换为数字的功能非常简单:

A B C POWER(2,CODE(A2) - 97)  POWER(2,CODE(A2) - 97)  POWER(2,CODE(A2) - 97)  SUM(D2:F2)
-+-+-+-----------------------+-----------------------+-----------------------+----------
b|a|c|2                      |1                      |4                      |7
c|a|b|4                      |1                      |2                      |7
a|c|b|1                      |4                      |2                      |7
a|c| |1                      |4                      |0                      |5
a|c|d|1                      |4                      |8                      |13
b|c|a|2                      |4                      |1                      |7
d|c|a|8                      |4                      |1                      |13

希望这种方法可以帮助您找到解决问题的方法。

答案 1 :(得分:2)

我会使用Class模块和集合对象。类模块由两个数组和一个计数器组成。第一个数组是原始顺序的行;第二个数组是按排序顺序排列的行。排序的顺序将用作集合对象的Key。如果您尝试添加Key已存在的集合对象,则会导致错误。捕获错误并将其添加到计数器中。

然后,对于结果,您将从"原始"中检索原始条目。阵列;和柜台。在柜台上排序,你就有了结果。

以下是完成上述操作的VBA代码示例。

首先,插入一个Class模块并将其重命名为RowEntries

Option Explicit
Private pOriginal() As Variant
Private pSorted() As Variant
Private pCount As Long

Public Property Get Original() As Variant
    Original = pOriginal
End Property
Public Property Let Original(Value As Variant)
    pOriginal = Value
End Property

Public Property Get Sorted() As Variant
    Sorted = pSorted
End Property
Public Property Let Sorted(Value As Variant)
    pSorted = Value
End Property

Public Property Get Count() As Long
    Count = pCount
End Property
Public Property Let Count(Value As Long)
    pCount = Value
End Property

然后插入常规模块。此代码假设您的源数据是A1周围的CurrentRegion;结果将在右边的几列中显示。这些算法很容易改变。

Option Explicit
Option Compare Text  'To make comparison case insensitive, if you want
Sub RankRows()
    Dim V As Variant, VtoSort As Variant
    Dim vRes() As Variant
    Dim cRowEntries As RowEntries
    Dim colRowEntries As Collection
    Dim sKey As String, S As String
    Dim I As Long
    Dim rSrc As Range, rRes As Range  'Location for Results

Set rSrc = Range("A1").CurrentRegion
Set rRes = rSrc.Offset(columnoffset:=rSrc.Columns.Count + 3).Resize(1, 2)

V = rSrc

Set colRowEntries = New Collection
On Error Resume Next
For I = 1 To UBound(V)
    Set cRowEntries = New RowEntries
    With cRowEntries
        .Original = WorksheetFunction.Index(V, I, 0)
        VtoSort = .Original
        Quick_Sort VtoSort, LBound(VtoSort), UBound(VtoSort)
        .Sorted = VtoSort
        .Count = 1
        sKey = CStr(Join(.Sorted, ", "))
        colRowEntries.Add cRowEntries, sKey
        If Err.Number <> 0 Then
            Err.Clear
            With colRowEntries(sKey)
                .Count = .Count + 1
            End With
        End If
    End With
Next I
On Error GoTo 0

'populate results array
ReDim vRes(1 To colRowEntries.Count, 1 To 2)
For I = 1 To colRowEntries.Count
    With colRowEntries(I)
        vRes(I, 1) = Join(.Original, "+")

            'remove trailing delimiters
            Do While Right(vRes(I, 1), 1) = "+"
                vRes(I, 1) = Left(vRes(I, 1), Len(vRes(I, 1)) - 1)
            Loop

        vRes(I, 2) = .Count
    End With
Next I

Set rRes = rRes.Resize(rowsize:=UBound(vRes), columnsize:=UBound(vRes, 2))
With rRes
    .EntireColumn.Clear
    .Value = vRes
    .Sort key1:=rRes.Columns(2), order1:=xlDescending, Header:=xlNo
End With

V = rRes
ReDim vRes(1 To UBound(V), 1 To 1)

For I = 1 To UBound(V)
    Select Case V(I, 2)
        Case 1
            S = "once"
        Case 2
            S = "twice"
        Case Else
            S = V(I, 2) & " times"
    End Select
    vRes(I, 1) = OrdinalNum(I) & " place: """ & V(I, 1) & """ found " & S

Next I

rRes.EntireColumn.Clear
rRes.Resize(columnsize:=1) = vRes
rRes.EntireColumn.AutoFit

End Sub


Sub Quick_Sort(ByRef SortArray As Variant, ByVal first As Long, ByVal last As Long)
Dim Low As Long, High As Long
Dim Temp As Variant, List_Separator As Variant
Low = first
High = last
List_Separator = SortArray((first + last) / 2)
Do
    Do While (SortArray(Low) < List_Separator)
        Low = Low + 1
    Loop
    Do While (SortArray(High) > List_Separator)
        High = High - 1
    Loop
    If (Low <= High) Then
        Temp = SortArray(Low)
        SortArray(Low) = SortArray(High)
        SortArray(High) = Temp
        Low = Low + 1
        High = High - 1
    End If
Loop While (Low <= High)
If (first < High) Then Quick_Sort SortArray, first, High
If (Low < last) Then Quick_Sort SortArray, Low, last
End Sub

Function OrdinalNum(num) As String
Dim Suffix As String

OrdinalNum = num
If Not IsNumeric(num) Then Exit Function
If num <> Int(num) Then Exit Function

Select Case num Mod 10
    Case Is = 1
        Suffix = "st"
    Case Is = 2
        Suffix = "nd"
    Case Is = 3
        Suffix = "rd"
    Case Else
        Suffix = "th"
End Select

Select Case num Mod 100
    Case 11 To 19
        Suffix = "th"
End Select

OrdinalNum = Format(num, "#,##0") & Suffix
End Function

输出将与您在上面的请求中显示的一样。但可以很容易地修改:

enter image description here

答案 2 :(得分:2)

这个问题非常有趣。它是展示如何使用数学来提供更简单解决方案的好样本。

我必须添加另一个答案,因为我意识到发现三个单词的重复组合与从零点计算distance in three-space相同 - 只需要给每个单词一个不同的数字。这个答案可以解决Pnuts之前提到的问题。

与我的上一个答案不同,如果你在三个成员中有200个短语和组合,计算的最大数字是120000(POWER(200,2)* 3),我的最后答案是1.60694E + 60(POWER(2,200) )。我的上一个答案可能在逻辑上解决了问题,但无法在Excel或许多编程语言中实现。它使用 permutations 解决方案来解决组合问题。

以下是使用三维空间距离的解决方案,它简单且易于扩展。

enter image description here

  1. 将每个单词映射到不同的数字。 (VLOOKUP是一种方法,你可能有其他方法。)结果数字不需要是连续的,只是彼此不同,最大数量应小于SQRT(POWER(2,32)/ 3 ))。
  2. 使用G1中的公式计算距离。
  3. 组和计数使用G列。(您可以在其他答案中找到方法。)
  4. 注意:我使用&#39; _&#39;替换空格单元格,为空间映射数字,这样就可以使a_a等于aa_(第4和第5行)。任何选择都应该有空格数。
  5. 任何改善这个答案的建议都将受到赞赏。

答案 3 :(得分:1)

我这样做的方法是使用字典浏览列表并计算行数。关键是行本身,所以我可以使用字典的Dictionary.Exists(Key)方法来查看我是否已经遇到该行。与每个键关联的值将是一个整数,每当我再次遇到同一行时,我都会递增。

解析列表后,我会迭代字典,将它们的键和值输出到excel中的列。 最后,我会在输出结果的范围内使用sort来按频率对它们进行排序。

这很简单,但您需要引用Microsoft Scripting Runtime来使用字典对象(请参阅此处http://www.techbookreport.com/tutorials/vba_dictionary.html)。

希望这有帮助。

<强>更新

既然你说过你可以在vba中试试这个方法。当我第一次使用CollectionDictionary对象时,我想我会添加一些总是绊倒我的东西。迭代条目时,迭代变量必须是Variant。我习惯于必须声明与迭代的数据类型相同的迭代变量,但这会在vba中给出错误。

答案 4 :(得分:1)

几乎只有公式的解决方案,假设数据在标记为ColumnsA:C,在D2中:

=VLOOKUP(A2,weight,2,0)+IFNA(VLOOKUP(B2,weight,2,0),)+IFNA(VLOOKUP(C2,weight,2,0),)  

向下复制到suit,其中weight(图像中的绿色)是查找表的命名范围(按照@Jaugar Chang建议的行构建)。在E2中并复制到适合:

 =IF(COUNTIF(D$2:D2,D2)=1,COUNTIF(D:D,D2),"")  
G1中的

=ROW()&MID("thstndrdthstndrdth",MATCH(IF(MOD(ROW(),100)>29,MOD(ROW(),10)+20,MOD(ROW(),100)),{0,1,2,3,4,21,22,23,24},1)*2-1,2)&" place: """&INDIRECT("A"&MATCH(H1,E:E,0))&"+"&INDIRECT("B"&MATCH(H1,E:E,0))&"+"&INDIRECT("C"&MATCH(H1,E:E,0))&""" found"  
H1中的

=LARGE(E:E,ROW())  
在I1中

=IF(H1>2,"times",IF(H1=1,"","twice"))

最后三个中的每一个都向下复制,直到错误消息为止。

ColumnH格式化:

[=1] "once";General

输出以黄色突出显示:

SO25070024 example

在此示例中,存在盈余+以及++盈余的可能性。

答案 5 :(得分:1)

这是我的版本使用数组操作,然后是一些范围操作。

编辑1:我已经阅读过pnut关于仅处理b的评论。顺便说一句,这不会处理+ a

Sub Test()
    Dim arr, unq
    Dim orng As Range, rng As Range, srng As Range
    Dim i As Long, k As Long
    Dim check As Boolean: check = False
    Dim freq As String
    '~~> pass range data to array
    Set orng = Sheet1.Range("A1", _
        Sheet1.Range("A" & Sheet1.Rows.Count).End(xlUp))
    For Each rng In orng
        If Not IsArray(arr) Then
            arr = Array(RngToArr(rng.Resize(, 3)))
        Else
            ReDim Preserve arr(UBound(arr) + 1)
            arr(UBound(arr)) = RngToArr(rng.Resize(, 3))
        End If
    Next
    '~~> pass unique combination and count to another array
    For i = LBound(arr) To UBound(arr)
        If IsEmpty(unq) Then
            ReDim unq(1 To 2, 1 To 1)
            unq(1, 1) = arr(i)
            unq(2, 1) = unq(2, 1) + 1
        Else
            For k = LBound(unq, 2) To UBound(unq, 2)
                If CompArr(arr(i), unq(1, k)) Then
                    check = False
                    unq(2, k) = unq(2, k) + 1
                    Exit For
                Else
                    check = True
                End If
            Next
            If check Then
                ReDim Preserve unq(1 To 2, 1 To UBound(unq, 2) + 1)
                unq(1, UBound(unq, 2)) = arr(i)
                unq(2, UBound(unq, 2)) = unq(2, UBound(unq, 2)) + 1
            End If
        End If
    Next
    '~~> Transpose and tidy up the array
    ReDim tally(1 To UBound(unq, 2), 1 To 2)
    For i = LBound(unq, 2) To UBound(unq, 2)
        tally(i, 1) = Join$(unq(1, i), "+")
        tally(i, 2) = unq(2, i)
    Next
    '~~> sort in worksheet, easier than sorting array
    With Sheet1
        Set srng = .Range("E1:F" & UBound(tally, 1))
        srng = tally
        .Sort.SortFields.Clear
        .Sort.SortFields.Add Key:=srng.Offset(0, 1).Resize(, 1), _
            SortOn:=xlSortOnValues, Order:=xlDescending, _
            DataOption:=xlSortNormal
        With .Sort
            .SetRange srng
            .Header = xlGuess
            .Orientation = xlTopToBottom
            .SortMethod = xlPinYin
            .Apply
        End With
    End With
    '~~> do some manipulation to make it closer to what you want
    For Each rng In srng.Offset(0, 1).Resize(, 1)
        Select Case rng.Value
        Case 1: freq = "found once"
        Case 2: freq = "found twice"
        Case Else: freq = "found " & rng.Value & " times"
        End Select
        rng.Value = freq
    Next
End Sub

Private Function CompArr(list1, list2) As Boolean
    Dim j As Long: CompArr = True
    For j = LBound(list1) To UBound(list1)
        With Application
            If IsError(.Match(list1(j), list2, 0)) _
                Then CompArr = False
        End With
    Next
End Function

Private Function RngToArr(r As Range) As Variant
    Dim c As Range, a
    For Each c In r
        If Len(c.Value) <> 0 Then
            If Not IsArray(a) Then
                a = Array(c.Value)
            Else
                ReDim Preserve a(UBound(a) + 1)
                a(UBound(a)) = c.Value
            End If
        End If
    Next
    RngToArr = a
End Function

<强>结果:

enter image description here

不完全按照你想要的方式,我无法想出如何动态设置第一名,第二名等。 另外,我没有深入加号(+)。如果有空白,结果可能是+ b + c,或+ + c +或a ++ c 无论如何,HTH。