我无法确定在相邻单元格中输入1
或0
以指示在处理大型数据集时值是否唯一的方法。我已经阅读了多种方法来实现这一点,但是对于我的目的来说,它们似乎都没有效率:我使用的是Excel 2010实例(所以我不拥有 Distinct Count
在这个StackOverflow问题中:Simple Pivot Table to Count Unique Values有使用SUMPRODUCT
或COUNTIF
的建议,但是当我处理50,000多行时,这会导致糟糕的性能和文件大小~35 MB而不是~3 MB。我想知道对于大型动态数据集是否有更好的解决方案,无论是公式还是VBA。
我想要完成的一个例子是( Unique 列是相邻的单元格):
Name Week Unique
John 1 1
Sally 1 1
John 1 0
Sally 2 1
我尝试编写COUNTIF
的相同功能,但没有成功:
For Each Cell In ThisWorkbook.Worksheets("Overtime & Type Data").Range("Z2:Z" & DataLastRow)
If Worksheets("Overtime & Type Data").Cells(Cell.Row, 26) <> Worksheets("Overtime & Type Data").Cells(Cell.Row - 1, 26) Then
FirstCell = Cell.Row
End If
If (Worksheets("Overtime & Type Data").Range(Cells(FirstCell, 26), Cells(Cell.Row, 26)) = Worksheets("Overtime & Type Data").Range(Cells(Cell.Row, 26))) = True Then
Cell.Value = 1
Else
Cell.Value = 0
End If
Next Cell
答案 0 :(得分:2)
此代码在不到3秒的时间内成功运行了超过130,000行。调整列字母以适合您的数据集。
Sub tgr()
Const colName As String = "A"
Const colWeek As String = "B"
Const colOutput As String = "C"
Dim ws As Worksheet
Dim rngData As Range
Dim DataCell As Range
Dim rngFound As Range
Dim collUniques As Collection
Dim arrResults() As Long
Dim ResultIndex As Long
Dim UnqCount As Long
Set ws = ThisWorkbook.Sheets("Overtime & Type Data")
Set rngData = ws.Range(colName & 2, ws.Cells(Rows.Count, colName).End(xlUp))
Set collUniques = New Collection
ReDim arrResults(1 To rngData.Cells.Count, 1 To 1)
On Error Resume Next
For Each DataCell In rngData.Cells
ResultIndex = ResultIndex + 1
collUniques.Add ws.Cells(DataCell.Row, colName).Value & ws.Cells(DataCell.Row, colWeek).Value, ws.Cells(DataCell.Row, colName).Value & ws.Cells(DataCell.Row, colWeek).Value
If collUniques.Count > UnqCount Then
UnqCount = collUniques.Count
arrResults(ResultIndex, 1) = 1
Else
arrResults(ResultIndex, 1) = 0
End If
Next DataCell
On Error GoTo 0
ws.Cells(rngData.Row, colOutput).Resize(rngData.Cells.Count).Value = arrResults
End Sub
答案 1 :(得分:0)
一种方法是按名称和周排序。然后,您可以通过与前一行进行比较来确定任何行的唯一性。
如果您需要保留订单,可以先写一列索引号(1,2,3,...)来跟踪订单。计算唯一后,按索引排序以恢复原始订单。
整个过程可以通过相对较少的步骤手动完成,也可以使用VBA自动完成。
答案 2 :(得分:0)
我不确定这对50000值的效果如何,但它在大约一秒钟内会达到~1500。
Sub unique()
Dim myColl As New Collection
Dim isDup As Boolean
Dim myValue As String
Dim r As Long
On Error GoTo DuplicateValue
For r = 1 To Sheet1.UsedRange.Rows.Count
isDup = False
'Combine the value of the 2 cells together
' and add that string to our collection
'If it is already in the collection it errors
myValue = Sheet1.Cells(r, 1).Value & Sheet1.Cells(r, 2).Value
myColl.Add r, myValue
If isDup Then
Sheet1.Cells(r, 3).Value = "0"
Else
Sheet1.Cells(r, 3).Value = "1"
End If
Next
On Error GoTo 0
Exit Sub
DuplicateValue:
'The value is already in the collection so put a 0
isDup = True
Resume Next
End Sub
答案 3 :(得分:0)
几乎任何批量操作都会击败涉及工作表单元格的循环。您可以通过在内存中执行所有计算来缩短时间,并在完成后仅将值返回到工作表 en masse 。
Sub is_a_dupe()
Dim v As Long, vTMP As Variant, vUNQs As Variant, dUNQs As Object
Debug.Print Timer
On Error GoTo bm_Uh_Oh
Set dUNQs = CreateObject("Scripting.Dictionary")
With Worksheets("Sheet1")
vTMP = .Range(.Cells(2, 1), .Cells(Rows.Count, 2).End(xlUp)).Value2
ReDim vUNQs(1 To UBound(vTMP, 1), 1 To 1)
For v = LBound(vTMP, 1) To UBound(vTMP, 1)
If dUNQs.Exists(Join(Array(vTMP(v, 1), vTMP(v, 2)))) Then
vUNQs(v, 1) = 0
Else
dUNQs.Add Key:=Join(Array(vTMP(v, 1), vTMP(v, 2))), _
Item:=vTMP(v, 2)
vUNQs(v, 1) = 1
End If
Next v
.Cells(2, 3).Resize(UBound(vUNQs, 1), 1) = vUNQs
End With
Debug.Print Timer
bm_Uh_Oh:
dUNQs.RemoveAll
Set dUNQs = Nothing
End Sub
以前的经验告诉我,各种各样的数据(以及硬件等)会影响流程的时间安排,但在随机抽样数据中,我收到了这些经过的时间。
50K记录..... 0.53秒
130K记录...... 1.32秒
500K记录.... 4.92秒