建立我过去的一个questions
我想要完成的事情:
我希望根据多个条件使用VBA代码查找并突出显示重复的Upcharge:
如果电子表格中有多个实例/行共享/匹配所有这些条件,则表示Upcharge是重复的。正如我上面链接的帖子所示:
我尝试了什么:
scripting.dictionary
,但我不确定如何(或者如果)使用多维数组。现在我终于找到了我认为会更快的方法,
我想要使用的更快的方法: 将上述列转储到多维数组中,在数组中找到重复的“行”,然后突出显示相应的电子表格行。
我尝试更快的方法: 这是我填充多维数组的方法
Sub populateArray()
Dim arrXID() As Variant, arrUpchargeOne() As Variant, arrUpchargeTwo() As Variant, arrUpchargeType() As Variant, arrUpchargeLevel() As Variant
Dim arrAllData() As Variant
Dim i As Long, lrow As Long
lrow = ActiveSheet.Cells(Rows.Count, 1).End(xlUp).Row
arrXID = Range("A2:A" & lrow) 'amend column number
arrUpchargeOne = Range("CT2:CT" & lrow)
arrUpchargeTwo = Range("CU2:CU" & lrow)
arrUpchargeType = Range("CV2:CV" & lrow)
arrUpchargeLevel = Range("CW2:CW" & lrow)
ReDim arrAllData(1 To UBound(arrXID, 1), 4) As Variant
For i = 1 To UBound(arrXID, 1)
arrAllData(i, 0) = arrXID(i, 1)
arrAllData(i, 1) = arrUpchargeOne(i, 1)
arrAllData(i, 2) = arrUpchargeTwo(i, 1)
arrAllData(i, 3) = arrUpchargeType(i, 1)
arrAllData(i, 4) = arrUpchargeLevel(i, 1)
Next i
End Sub
我可以将列添加到数组中,但是我从那里被卡住了。我不确定如何检查数组中的重复“行”。
我的问题:
上一篇文章中的公式供参考:
=AND(SUMPRODUCT(($A$2:$A$" & lastRow & "=$A2)*($CT$2:$CT$" & lastRow & "=$CT2)*($CU$2:$CU$" & lastRow & "=$CU2)*($CV$2:$CV$" & lastRow & "=$CV2)*($CW$2:$CW$" & lastRow & "=$CW2))>1,$CT2 <> """")"
Returns TRUE if Upcharge is a duplicate
答案 0 :(得分:4)
你说识别重复;我听到Scripting.Dictionary对象。
Public Sub lminyDupes()
Dim d As Long, str As String, vAs As Variant, vCTCWs As Variant
Dim dDUPEs As Object '<~~ Late Binding
'Dim dDUPEs As New Scripting.Dictionary '<~~ Early Binding
Debug.Print Timer
Application.ScreenUpdating = False '<~~ uncomment this once you are no longer debugging
'Remove the next line with Early Binding¹
Set dDUPEs = CreateObject("Scripting.Dictionary")
dDUPEs.comparemode = vbTextCompare
With Worksheets("Upcharge") '<~~ you know what worksheet you are supposed to be on
With .Cells(1, 1).CurrentRegion
With .Resize(.Rows.Count - 1, .Columns.Count).Offset(1, 0)
.Columns(97).Interior.Pattern = xlNone '<~~ reset column CS
'the following is intended to mimic a CF rule using this formula
'=AND(COUNTIFS(A:A, A2, CT:CT, CT2, CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1, SIGN(LEN(CT2)))
vAs = .Columns(1).Value2
vCTCWs = Union(.Columns(98), .Columns(99), .Columns(100), .Columns(101)).Value2
For d = LBound(vAs, 1) To UBound(vAs, 1)
If CBool(Len(vCTCWs(d, 1))) Then
'make a key of the criteria values
str = Join(Array(vAs(d, 1), vCTCWs(d, 1), vCTCWs(d, 2), vCTCWs(d, 3), vCTCWs(d, 4)), ChrW(8203))
If dDUPEs.exists(str) Then
'the comboned key exists in the dictionary; append the current row
dDUPEs.Item(str) = dDUPEs.Item(str) & Chr(44) & "CS" & d
Else
'the combined key does not exist in the dictionary; store the current row
dDUPEs.Add Key:=str, Item:="CS" & d
End If
End If
Next d
'reuse a variant var to provide row highlighting
Erase vAs
For Each vAs In dDUPEs.keys
'if there is more than a single cell address, highlight all
If CBool(InStr(1, dDUPEs.Item(vAs), Chr(44))) Then _
.Range(dDUPEs.Item(vAs)).Interior.Color = vbRed
Next vAs
End With
End With
End With
dDUPEs.RemoveAll: Set dDUPEs = Nothing
Erase vCTCWs
Application.ScreenUpdating = True
Debug.Print Timer
End Sub
这似乎比公式方法更快。
¹如果您打算将Scripting.Dictionary对象的后期绑定转换为早期绑定,则必须将 Microsoft Scripting Runtime 添加到VBE的工具►参考。
答案 1 :(得分:1)
<强> SUMPRODUCT vs COUNTIFS 强>
首先,您选择的功能不适合这么多行以及几个条件。 COUNTIFS function可以执行SUMPRODUCT function可以执行的许多相同的多标准操作,但通常可以计算25-35%的计算负载和时间。此外,可以在COUNTIFS中使用完整列引用而不会造成损害,因为列引用在Worksheet.UsedRange property的限制内部被截断。
您的标准公式可以用COUNTIFS编写,
=AND(COUNTIFS(A:A, A2, CT:CT, CT2, CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1, CT2<>"")
'... or,
=COUNTIFS(A:A, A2, CT:CT, CT2, CT:CT, "<>", CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1
将非空白列CT条件直接引入COUNTIFS函数实际上略微改善了计算时间。
<强> Only Calculate When You Have To 强>
原始公式可以分解为两个主要条件。
如果条件不正确,则基本IF function停止处理。如果将CT列中非空单元格的测试移入包装IF,则只有当前行的CT列中存在值时才会处理COUNTIFS(计算的大部分)。
改进的标准公式变为,
=IF(CT2<>"", COUNTIFS(A:A, A2, CT:CT, CT2, CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1)
此修改的好处取决于CT列中的空白单元格数。如果15,000个细胞中只有1%是空白的,那么注意到很少的改善。但是,如果CT列中50%的细胞通常是空白的,那么将会有很大的改进,因为您实际上将计算周期缩减了一半。
<强> Sorting the Data to Limit the Ranges 强>
到目前为止,最大的计算寄生虫是COUNTIFS在五个单独的列中查看15,000行数据。如果数据在一个或多个条件列上排序,则无需查看所有15,000行以匹配所有五列标准。
出于本练习的目的,将假设列A以升序方式排序。如果您想测试此处讨论的假设,请立即对数据进行排序。
INDEX function不只是返回一个值;它实际返回一个有效的单元格地址。当在最常见的查找容量中使用时,您会看到返回的值,但实际上,与仅返回单元格值的类似VLOOKUP操作不同,INDEX返回实际的单元格;例如=A1
,而不是A1包含的99
。此超级功能可用于创建可用于其他功能的有效范围。例如A2:A9
也可以写为INDEX(A:A, 2):INDEX(A:A, 9)
。
此功能无法在条件格式规则中直接使用。但是,可以在 Named Range 中使用,命名范围可用于条件格式规则。
Sub lminyCFrule()
Debug.Print Timer
'Application.ScreenUpdating = False '<~~ uncomment this once you are no longer debugging
On Error Resume Next '<~~ needed for deleting objects without checking to see if they exist
With Worksheets("Upcharge") '<~~ you know what worksheet you are supposed to be on
If .AutoFilterMode Then .AutoFilterMode = False
'delete any existing defined name called 'localXID' or 'local200'
With .Parent
.Names("localXID").Delete
.Names("local200").Delete
End With
'create a new defined name called 'localXID' for CF rule method 1
.Names.Add Name:="localXID", RefersToR1C1:= _
"=INDEX('" & .Name & "'!C1:C104, MATCH('" & .Name & "'!RC1, '" & .Name & "'!C1, 0), 0):" & _
"INDEX('" & .Name & "'!C1:C104, MATCH('" & .Name & "'!RC1, '" & .Name & "'!C1 ), 0)"
'create a new defined name called 'local200' for CF rule method 2
.Names.Add Name:="local200", RefersToR1C1:= _
"=INDEX(Upcharge!C1:C104, MAX(2, ROW()-100), 0):INDEX(Upcharge!C1:C101, ROW()+100, 0)"
With .Cells(1, 1).CurrentRegion
'sort on column A in ascending order
.Cells.Sort Key1:=.Columns(1), Order1:=xlAscending, _
Orientation:=xlTopToBottom, Header:=xlYes
'create a CF rule on column CS
With .Resize(.Rows.Count - 1, 1).Offset(1, 96)
With .FormatConditions
.Delete
' method 1 and method 2. Only use ONE of these!
' method 1 - definitively start and end of XIDs in column A (slower, no mistakes)
'.Add Type:=xlExpression, Formula1:= _
"=IF(CT2<>"""", COUNTIFS(INDEX(localXID, 0, 1), A2, INDEX(localXID, 0, 98), CT2," & _
"INDEX(localXID, 0, 99), CU2, INDEX(localXID, 0, 100), CV2," & _
"INDEX(localXID, 0, 101), CW2)-1)"
' method 2 - best guess at start and end of XIDs in column A (faster, guesswork at true scope)
.Add Type:=xlExpression, Formula1:= _
"=IF(CT2<>"""", COUNTIFS(INDEX(local200, 0, 1), A2, INDEX(local200, 0, 98), CT2," & _
"INDEX(local200, 0, 99), CU2, INDEX(local200, 0, 100), CV2," & _
"INDEX(local200, 0, 101), CW2)-1)"
End With
.FormatConditions(.FormatConditions.Count).Interior.ColorIndex = 3
End With
'Filter based on column CS is red
.Columns(97).AutoFilter Field:=1, Criteria1:=vbRed, Operator:=xlFilterCellColor
End With
End With
Application.ScreenUpdating = True
Debug.Print Timer
End Sub
虽然没有快速尖叫,但这很方便。最佳猜测&#39;比最终的开始和结束更快&#39;但是你冒着不完全覆盖A栏中重复范围的风险。当然,可以调整控制范围的偏移量(例如100上下)。
答案 2 :(得分:0)
考虑一个SQL解决方案,因为这是一个典型的aggregate group by query,您可以在其中筛选大于1的计数。要了解您的路径,需要在循环中跨越阵列的所有元素使用许多条件逻辑。
虽然我建议您只是将数据导入数据库,如Excel的兄弟MS Access,但Excel可以使用ADO connection在自己的工作簿上运行SQL语句(不是为了详细说明,而是Excel和Access使用相同的Jet / ACE引擎)。还有一件好事是你似乎被设置为使用像命名列的结构这样的表来运行这样的查询。
以下示例在名为 Data (Data$
)的工作表中引用您的字段,并将输出查询到名为 Results (带标题)的工作表。根据需要更改名称。包括两个连接字符串(其中一个被注释掉)。希望它能在你的最后运行!
Sub RunSQL()
Dim conn As Object, rst As Object
Dim i As Integer, fld As Object
Dim strConnection As String, strSQL As String
Set conn = CreateObject("ADODB.Connection")
Set rst = CreateObject("ADODB.Recordset")
' Connection and SQL Strings
' strConnection = "DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
' & "DBQ=C:\Path\To\Workbook.xlsm;"
strConnection = "Provider=Microsoft.ACE.OLEDB.12.0;" _
& "Data Source='C:\Path\To\Workbook.xlsm';" _
& "Extended Properties=""Excel 8.0;HDR=YES;"";"
strSQL = " SELECT [Data$].[Product's XID], [Data$].[Upcharge Criteria 1]," _
& " [Data$].[Upcharge Criteria 2], [Data$].[Upcharge Type]," _
& " [Data$].[Upcharge Type], [Data$].[Upcharge Level]" _
& " FROM [Data$]" _
& " GROUP BY [Data$].[Product's XID], [Data$].[Upcharge Criteria 1]," _
& " [Data$].[Upcharge Criteria 2], [Data$].[Upcharge Type]," _
& " [Data$].[Upcharge Type], [Data$].[Upcharge Level]," _
& " [Data$].[Product's XID]" _
& " HAVING COUNT(*) > 1;"
' Open the db connection
conn.Open strConnection
rst.Open strSQL, conn
' Column headers
i = 0
Worksheets("Results").Range("A1").Activate
For Each fld In rst.Fields
ActiveCell.Offset(0, i) = fld.Name
i = i + 1
Next fld
' Data rows
Worksheets("Results").Range("A2").CopyFromRecordset rst
rst.Close
conn.Close
End Sub
答案 3 :(得分:0)
为什么不删除Indirect()
并将Countif()
函数替换为一些稳定的Row
引用。由于Indirect()
部分是易变的,而不是使用Indirect()
,您可以立即使用一些稳定的行引用,例如$A$2:$A$50000
,这可能会显示性能方面的一些重大变化。
或者
使用Create Table作为数据。在公式中使用表格参考,这将比Indirect()
参考更快。
修改强>
您的实际公式
=AND(SUMPRODUCT(($A$2:$A$500=$A2)*($CU$2:$CU$500=$CU2)*($CV$2:$CV$500=$CV2)*($CW$2:$CW$500=$CW2)*($CX$2:$CX$500=$CX2))>1,$CU2 <> "")
为什么不用稳定的参考将其转换为Counti(S)
,如下所示?
=AND(COUNTIFS($A$2:$A$500,$A2,$CU$2:$CU$500,$CU2,$CV$2:$CV$500,$CV2,$CW$2:$CW**$500,$CW2,$CX$2:$CX$500,$CX2)>1,$CU12<>"")
答案 4 :(得分:0)
这可能就像一个魔术,但不确定它是否会起作用。
您是否可以创建另一个支持性(临时)列,连接所有四个条件?
ZZ_Temp = concatenate(CS; CV; CZ; etc)
这样,我想,你可以更快地显示/突出显示重复项。