我有一组Excel工作表,每个工作表设置如下:
ID | imageName
--------------
1 abc.jpg
2 def.bmp
3 abc.jpg
4 xyz123.jpg
此表格对应于包含以下内容的文件夹:
abc.pdf
ghijkl.pdf
def.pdf
def.xls
x-abc.pdf
我正在尝试生成一个报告,该报告将每个imageName
的实例与最低ID
的实例匹配,并与其匹配,并在工作表中标识不匹配的imageName
文件夹中不匹配的PDF。带有“x-”前缀的文件名相当于没有前缀的文件名,因此该数据集的报告如下:
ID imageName filename
-----------------------
1 abc.jpg abc.pdf
1 abc.jpg x-abc.pdf
2 def.bmp def.pdf
4 xyz123.jpg
ghijkl.pdf
我目前的解决方案如下:
'sheetObj is the imageName set, folderName is the path to the file folder
sub makeReport(sheetObj as worksheet,folderName as string)
dim fso as new FileSystemObject
dim imageDict as Dictionary
dim fileArray as variant
dim ctr as long
'initializes fileArray for storing filename/imageName pairs
redim fileArray(1,0)
'returns a Dictionary where key is imageName and value is lowest ID for that imageName
set imageDict=lowestDict(sheetObj)
'checks all files in folder and populates fileArray with their imageName matches
for each file in fso.getfolder(folderName).files
fileFound=false
'gets extension and checks if it's ".pdf"
if isPDF(file.name) then
for each key in imageDict.keys
'checks to see if base names are equal, accounting for "x-" prefix
if equalNames(file.name,key) then
'adds a record to fileArray mapping filename to imageName
addToFileArray fileArray,file.path,key
fileFound=true
end if
next
'checks to see if filename did not match any dictionary entries
if fileFound=false then
addToFileArray fileArray,file.path,""
end if
end if
next
'outputs report of imageDict entries and their matches (if any)
for each key in imageDict.keys
fileFound=false
'checks for all fileArray matches to this imageName
for ctr=0 to ubound(fileArray,2)
if fileArray(0,ctr)=key then
fileFound=true
'writes the data for this match to the worksheet
outputToExcel sheetObj,key,imageDict(key),fileArray(0,ctr)
end if
next
'checks to see if no fileArray match was found
if fileFound=false then
outputToExcel sheetObj,key,imageDict(key),""
end if
next
'outputs unmatched fileArray entries
for ctr=0 to ubound(fileArray,2)
if fileArray(1,ctr)="" then
outputToExcel sheetObj,"","",fileArray(0,ctr)
end if
next
此程序成功输出报告,但速度非常慢。由于嵌套的For循环,随着imageName
条目和文件数量的增加,处理它们的时间呈指数级增长。
有没有更好的方法来检查这些套装中的匹配?如果我将fileArray
放入字典中可能会更快,但字典不能有重复的键,并且此数据结构需要在其字段中具有重复条目,因为文件名可能匹配多个imageNames,反之亦然。
答案 0 :(得分:0)
这应该很快找到第一个。你可以在最后一句if语句的内部做任何你想做的事。它使用ADO记录集,它应该比嵌套for循环
更快Sub match()
Dim sheetName As String: sheetName = "Sheet1"
Dim rst As New ADODB.Recordset
Dim cnx As New ADODB.Connection
Dim cmd As New ADODB.Command
'setup the connection
'[HDR=Yes] means the Field names are in the first row
With cnx
.Provider = "Microsoft.Jet.OLEDB.4.0"
.ConnectionString = "Data Source='" & ThisWorkbook.FullName & "'; " & "Extended Properties='Excel 8.0;HDR=Yes;IMEX=1'"
.Open
End With
'setup the command
Set cmd.ActiveConnection = cnx
cmd.CommandType = adCmdText
cmd.CommandText = "SELECT * FROM [" & sheetName & "$]"
rst.CursorLocation = adUseClient
rst.CursorType = adOpenDynamic
rst.LockType = adLockOptimistic
'open the connection
rst.Open cmd
Dim fso As FileSystemObject: Set fso = New FileSystemObject
Dim filesInFolder As files, f As File
Set filesInFolder = fso.GetFolder("C:\Users\Bradley\Downloads").files
For Each f In filesInFolder
rst.MoveFirst
rst.Find "imageName = '" & f.Name & "'", , adSearchForward
If Not rst.EOF Then
Debug.Print rst("imagename") & "::" & rst("ID") '<-- Do what you need to do here
End If
Next f
End Sub
仅供参考:我引用了this帖子
答案 1 :(得分:0)
另一种方式。
Sub Sample()
Dim ws As Worksheet, wstemp As Worksheet
Dim FileAr() As String
Dim n As Long, wsLRow As Long
Set ws = Sheets("Sheet1") '<~~ Which has imageNames
wsLRow = ws.Range("A" & ws.Rows.Count).End(xlUp).Row
n = 0
strFile = Dir("C:\Temp\*.*")
Do While strFile <> ""
n = n + 1
ReDim Preserve FileAr(n)
If Mid(strFile, Len(strFile) - 3, 1) = "." Then
FileAr(n) = Mid(strFile, 1, Len(strFile) - 4)
ElseIf Mid(strFile, Len(strFile) - 4, 1) = "." Then
FileAr(n) = Mid(strFile, 1, Len(strFile) - 5)
Else
FileAr(n) = strFile
End If
strFile = Dir
Loop
Set wstemp = Worksheets.Add
wstemp.Range("A1").Resize(UBound(FileAr) + 1, 1).Value = Application.Transpose(FileAr)
ws.Range("B1:B" & wsLRow).Formula = "=IF(ISERROR(VLOOKUP(A1," & wstemp.Name & _
"!A:A,1,0)),"""",VLOOKUP(A1," & wstemp.Name & "!A:A,1,0))"
ws.Range("B1:B" & wsLRow).Value = ws.Range("B1:B" & wsLRow).Value
Application.DisplayAlerts = False
wstemp.Delete
Application.DisplayAlerts = True
End Sub
答案 2 :(得分:0)
感谢您的回复。
我最终通过在folderName
中创建一个文件名数组来解决这个问题,使用WinAPI FindFirstFile
和FindNextFile
函数来浏览文件夹,因为它是通过网络进行迭代的通过fso.getfolder(foldername).files
返回的集合太慢了。
然后我从文件名数组中创建了一个文件名/基本名字字典,如下:
key | value
-----------------------
abc.pdf | abc
x-lmnop.pdf | lmnop
x-abc.pdf | abc
从这本词典中我制作了一个反向词典fileConcat
,它将来自重复基本名称的键连接起来,如下:
key | value
-----------------------
abc | abc.pdf,x-abc.pdf
lmnop | lmnop.pdf
然后我能够将每个imageDict
键的基名与fileConcat
中的键匹配,然后迭代生成的连接值数组:
split(fileConcat(key))
其中key
是imageDict
键的基本名称。
正如@chrisneilsen所评论的那样,消除嵌套的For循环会将增长率降低到O(ImageNames)+O(Files)
,并且该功能现在以令人满意的速度运行。