感谢您阅读我的问题,
我收到了一份大约250,000个条目的列表,以及每个条目在他们登录时显示的姓名和登录日期。我的任务是找出连续几天登录的用户,频率和次数。
即。鲍勃史密斯连续3天一次,连续5天3次。 乔史密斯连续8天一次,连续5天8次 等
我是VBA的新手,并且一直在努力编写一个程序来执行此操作。 代码:
Option Explicit
Option Base 1
Sub CountUUIDLoop()
Dim UUID As String
Dim Day As Date
Dim Instance() As Variant
ReDim Instance(50, 50)
Dim CountUUID As Variant
Dim q As Integer
Dim i As Long
Dim j As Long
Dim f As Integer
Dim g As Integer
Dim LastRow As String
f = 1
q = 1
g = 2
LastRow = Cells.Find("*", [A1], , , xlByRows, xlPrevious).Row
For i = q To LastRow
UUID = Cells(i, "A")
Instance(f, 1) = UUID
g = 2
For j = 1 To LastRow
If UUID = Cells(j, "A") Then
Instance(f, g) = Cells(j, "B")
g = g + 1
End If
Next j
f = f + 1
q = g - 1
Next i
End Sub
此代码的目标是浏览条目并将其存储在数组' Instance'使得2D阵列看起来像[UUID1,B1,B2,B3] [UUID2,B1,B2,B3,B4] [UUID3,B1,B2]
如果UUID是用户,则B1表示用户登录的日期,b2表示他们登录的下一个日期等。某些用户的日期多于或少于其他用户。
我的主要问题是设置阵列,因为我不断遇到不同的错误。我不确定如何部分地定义这个2D阵列,因为将有超过30 000行,每行有1-> 85列。
感谢任何帮助,如果有任何需要进一步澄清,请告诉我。这是我第一次使用VBA,所以我很抱歉,如果我所做的一切都是错的。
P.S。我使用ReDim Instance(50,50)作为测试,看看我是否可以通过预定义使其工作但发生了相同的错误。再次感谢!
答案 0 :(得分:1)
我建议使用集合和字典而不是数组。下面的代码将以与您想要的方式非常相似的方式构建数据。
Sub collect_logins_by_user_()
'you need to enable the microsoft scripting runtime
'in tools - references
'assuming unique ids are in col A and there are no gaps
'and assuming dates in col B and there are no gaps
'
'The expected runtime for this is O(n) and I have used similar code on more than 250.000 record.
'It still takes a while obviously, but should run just fine.
'
'The the data will bestructed in the following format:
'{id_1: [d_1, d_2,...], id_2: [d_3, d_4,...], ...}
Dim current_id As Range: Set current_id = ActiveSheet.Range("A2") 'modify range as required
Dim logins_by_users As New Dictionary
While Not IsEmpty(current_id)
If Not logins_by_users.Exists(current_id.Value) Then
Set logins_by_users(current_id.Value) = New Collection
End If
logins_by_users(current_id.Value).Add current_id.Offset(ColumnOffset:=1).Value
Set current_id = current_id.Offset(RowOffset:=1)
Wend
'Once you have the data structured, you can do whatever you want with it.
'like printing it to the immediate window.
Dim id_ As Variant
For Each id_ In logins_by_users
Debug.Print "======================================================="
Debug.Print id_
Dim d As Variant
For Each d In logins_by_users(id_)
Debug.Print d
Next d
Next id_
Debug.Print "======================================================="
End Sub
答案 1 :(得分:0)
我编写了一些代码,它们按照您要执行的操作执行某些操作 - 它会向调试窗口打印每个用户的不同连续日志数,并以逗号分隔。
这段代码使用了字典对象 - 它本质上是一个关联数组,其中索引不像数组中那样被限制为数字,并且提供了一些方便的功能来操作数组的数据。
我已在包含colomn A中的用户ID和B列中的日志日期的工作表上测试了这一点 - 包括标题 - 这看起来工作正常。免费尝试一下
Sub mysub()
Dim dic As Object
Dim logs As Variant
Dim myval As Long
Dim mykey As Variant
Dim nb As Long
Dim i As Long
Set dic = CreateObject("Scripting.dictionary")
'CHANGE TO YOUR SHEET REFERENCE HERE
For Each cell In Range(Cells(2, 1), Cells(Worksheets("Sheet8").Rows.count, 1).End(xlUp))
mykey = cell.Value
myval = cell.Offset(0, 1)
If myval <> 0 Then
On Error GoTo ERREUR
dic.Add mykey, myval
On Error GoTo 0
End If
Next cell
For Each Key In dic
logs = Split(dic(Key), ",")
logs = sortArray(logs)
i = LBound(logs) + 1
nb = 1
Do While i <= UBound(logs)
Do While CLng(logs(i)) = CLng(logs(i - 1)) + 1
nb = nb + 1
i = i + 1
Loop
If nb > 1 Then
tot = tot & "," & CStr(nb)
nb = 1
End If
i = i + 1
Loop
If tot <> "" Then dic(Key) = Right(tot, Len(tot) - 1)
Debug.Print "User: " & Key & " - Consecutive logs: " & dic(Key)
tot = ""
mys = ""
Next Key
Exit Sub
ERREUR:
If myval <> 0 Then dic(mykey) = dic(mykey) & "," & CStr(myval)
Resume Next
End Sub
Function sortArray(a As Variant) As Variant
For i = LBound(a) + 1 To UBound(a)
j = i
Do While a(j) < a(j - 1)
temp = a(j - 1)
a(j - 1) = a(j)
a(j) = temp
j = j - 1
If j = 0 Then Exit Do
Loop
Next i
sortArray = a
End Function
答案 2 :(得分:0)
据我理解你的问题和代码,你有一个表格如下:
..............的 A 强> .................的乙强>
1 ........ LOGIN1 ....... DATE1
2 ........ LOGIN1 ....... DATE2
3 ........ LOGIN1 ....... DATE3
4 ........ Login2身份....... DATE4
5 ........ Login2身份....... DATE5
6 ........ LOGIN3 ....... DATE6
此代码中的任务是获取2D结构中的数据,如下所示:
RESULT_ARRAY-
............................ | -LOGIN1-
............................................ | -DATE1
............................................ | -DATE2
............................................ | -DATE3
............................ | -LOGIN2-
............................................ | -DATE4
............................................ | -DATE5
............................ | -LOGIN3-
............................................ | -DATE6
首先,您需要知道代码中出了什么问题。请参阅下面的代码中的注释,找出错误原因:
Option Explicit
Option Base 1
Sub CountUUIDLoop()
Dim UUID As String
Dim Day As Date
Dim Instance() As Variant ' If you are using variant data type, it is not necesary to point it: default data type in VBA is Variant. Just write like this: "Dim Instance()"
ReDim Instance(50, 50) ' Limitation in 50 may be the reason, why your script is going into "out of range" error.
' Remember, that this operation means, that your array now will have following dimentions: [1..50,1..50]
Dim CountUUID As Variant 'Just write like this: "Dim CountUUID"
Dim q As Integer ' you can describe all your variables in one line, like this: "Dim q as Integer,f as Integer,g as Integer"
Dim i As Long
Dim j As Long
Dim f As Integer
Dim g As Integer
Dim LastRow As String ' first mistake: you are using String data type to perform numeric operations below in your FOR-cycle
f = 1 ' Your Instance array index starts from {0} and you are not using this index by starting from {1}.
q = 1 ' The reason to use this variable is not obvious. You could just use constant in FOR cycle below and avoid unnecessary variables.
g = 2 ' You could remove this line, because this var is set every time in cycle below (before second FOR)
LastRow = Cells.Find("*", [A1], , , xlByRows, xlPrevious).Row ' The alternative here is to use predefined Excel constants, like this:
' "Cells.SpecialCells(xlLastCell).Row".
'If LastRow is bigger, than {50} - this could be a reason of your Error.
For i = q To LastRow ' Here goes comparison between String and Integer data type, not good thing, but type conversion should work fine here.
UUID = Cells(i, "A") ' No need to perform re-set here, just move forward and assign value from this cell to the Instanse directly:
' Like this: Instance(f, 1) = Cells(i, "A")
Instance(f, 1) = UUID
g = 2
For j = 1 To LastRow ' It is another point, why "q" variable is not necessary. :)
If UUID = Cells(j, "A") Then ' You could use your Instansce value instead of UUID there, like this: "Instance(f, 1)"
Instance(f, g) = Cells(j, "B") 'If "g" variable will somehow become bigger, than {49}, this could become a reason of your Error.
g = g + 1
End If
Next j
f = f + 1
q = g - 1 ' "q" variable is not used after this row, so it is a strange unnecessary action
Next i
End Sub
现在,当我们获得有关错误的一些信息时,让我对您的代码进行一些改进。我确信,为了制作最简单的代码,您可以使用Excel工作表来存储和计算VBA数据作为后台自动化。但是如果你需要带数组的代码,那就让我们这样做吧! :)
Option Explicit ' It is an option that turns on check for every used variable to be defined before execution. If this option is not defined, your code below will find undefined variables and define them when they are used. Good practice is to use this option, because it helps you, for example to prevent missprinting errors in variable names.
Option Base 1 ' This option sets the default index value for arrays in your code. If this option is not set, the default index value will be {0}.
Const HEADER_ROW = 1 ' It is a number to identify your header row, next row after this one will be counted as a row with data
Const UUID = 1 ' ID of element in our "Instance" array to store UUID
Const DATES_ID = 2 ' ID of element in our "Instance" array to store dates
Function CountUUIDLoop()
ActiveSheet.Copy After:=ActiveSheet 'Copy your worksheet to new one to ensure that source data will not be affected.
Dim Instance(), dates() ' "Instance" will be used to store all the data, "dates" will be used to store and operate with dates
ReDim Instance(2, 1) ' Set first limitation to the "Instance" array in style [[uuid, dates],id]
ReDim dates(1) ' Set first limitation to the "dates" array
Instance(DATES_ID, 1) = dates
Dim CountUUID
Dim i as Long, j as Long, f as Long, active_element_id As Long 'Integer is quite enough to perform our array manipulations, but Long datatype is recomended (please refer to UPDATE2 below)
i = HEADER_ROW + 1 ' Set first row to fetch data from the table
active_element_id = 1 ' Set first active element number
With ActiveSheet ' Ensure that we are working on active worksheet.
While .Cells(i, 1) <> "" 'If operated cell is not empty - continue search for data
If i > HEADER_ROW + 1 Then
active_element_id = active_element_id + 1 ' increment active element number
ReDim Preserve Instance(2, active_element_id) ' Assign new limitation (+ 1) for our Instances, don't forget to preserve our results.
ReDim dates(1) ' Set first limitation to the "dates" array
Instance(DATES_ID, active_element_id) = dates
End If
Instance(UUID, active_element_id) = .Cells(i, 1) ' save UUID
dates(1) = .Cells(i, 2) ' save first date
j = i + 1 ' Set row to search next date from as next row from current one.
While .Cells(j, 1) <> "" 'If operated cell is not empty - continue search for data
If .Cells(j, 1) = .Cells(i, 1) Then
ReDim Preserve dates(UBound(dates) + 1) ' Expand "dates" array, if new date is found.
dates(UBound(dates)) = .Cells(j, 2) ' Save new date value.
.Cells(j, 1).EntireRow.Delete 'Remove row with found date to exclude double checking in future
Else
j = j + 1 ' If uuid is not found, try next row
End If
Wend
Instance(DATES_ID, active_element_id) = dates
i = i + 1 'After all the dates are found, go to the next uuid
Wend
.Cells(i, 1) = "UUID COUNT" ' This will write you a "UUID COUNT" text in A column below all the rest of UUIDs on active worksheet
.Cells(i, 2) = i - HEADER_ROW - 1 ' This will write you a count of UUIDs in B column below all the rest of UUIDs on active worksheet
End With
CountUUIDLoop = Instance ' This ensures that your function (!) returns an array with all UUIDs and dates inside.
End Function
此函数将在活动工作表底部打印您的UUID计数,并返回如下数组:
[[LOGIN1][1], [[DATE1][DATE2][DATE3]][1]]
我已经使用了这种存储数据的顺序,以避免扩展多维数组时出错。此错误与您的错误相似,因此您可以在此处详细了解:How can I "ReDim Preserve" a 2D Array in Excel 2007 VBA so that I can add rows, not columns, to the array?
Excel VBA - How to Redim a 2D array?
ReDim Preserve to a Multi-Dimensional Array in Visual Basic 6
无论如何,您可以使用我的函数输出("Instance" array
)来执行您的进一步操作,以找到您需要的内容,甚至可以显示您的uuid-dates值。 :)
祝你在进一步的VBA行动中好运!
以下是测试程序,说明如何使用上述功能的结果:
Sub test()
Dim UUIDs ' The result of the "CountUUIDLoop" function will be stored there
Dim i as Long, j As Long ' Simple numeric variables used as indexies to run through our resulting array
UUIDs = CountUUIDLoop ' assign function result to a new variable
Application.DisplayAlerts = False ' Disable alerts from Excel
ActiveSheet.Delete ' Delete TMP worksheet
Application.DisplayAlerts = True ' Enable alerts from Excel
If UUIDs(UUID, 1) <> Empty Then ' This ensures that UUIDs array is not empty
Sheets.Add After:=ActiveSheet ' Add new worksheet after active one to put data into it
With ActiveSheet 'Ensure that we are working with active worksheet
.Cells(HEADER_ROW, 1) = "UUIDs/dates" ' Put the header into the "HEADER_ROW" row
For i = 1 To UBound(UUIDs, 2) ' run through all the UUIDs
.Cells(1 + HEADER_ROW, i) = UUIDs(UUID, i) ' Put UUID under the header
For j = 1 To UBound(UUIDs(DATES_ID, i)) ' run through all the dates per UUID
.Cells(j + 1 + HEADER_ROW, i) = UUIDs(DATES_ID, i)(j) ' put date into column below the UUID
Next j ' Go to next date
Next i ' Go to next UUID
.Cells.EntireColumn.AutoFit ' This will make all columns' width to fit its contents
End With
Else
MsgBox "No UUIDs are found!", vbCritical, "No UUIDs on worksheet" ' Show message box if there are no UUIDs in function result
End If
End Sub
因此,如果您在活动工作表上有以下数据:
..............的 A 强> .................的乙强>
1 ........ LOGIN1 ....... DATE1
2 ........ LOGIN1 ....... DATE2
3 ........ LOGIN1 ....... DATE3
4 ........ Login2身份....... DATE4
5 ........ Login2身份....... DATE5
6 ........ LOGIN3 ....... DATE6
...这个子将UUID放在新表上,如下所示:
..............的 A 强> .................的乙强> ... .............. <强> C 强>
1点........的UUID /日期
2 ........ LOGIN1 ........ Login2身份........ LOGIN3
3 ........ DATE1 ......... ......... DATE4
DATE6
4 ........ DATE2 ......... DATE5
5 ........ DATE3
<强> UPDATE2 强>
当需要整数(或整数)变量时,建议使用Long
数据类型而不是Integer
每种类型。 Long
稍微快一点,它有更广泛的限制,并且不需要额外的内存。这是证明链接:
MSDN:The Integer, Long, and Byte Data Types