我正在编写一个VB.NET webforms网站,其中一页必须将文件列表加载到列表框中。它需要将所有PDF和TIF文件加载到数据库中没有条目的目录中。我正在使用以下代码成功完成此操作。基本上,我查询数据库以获取文件名条目的arraylist,然后遍历目录中的每个文件,针对arraylist中的每个条目检查其名称,如果其名称不在arraylist中,则将其添加到list以绑定到列表框:
Dim category As String = "RFQ"
'Initialize database connection variables
Dim sql As String
Dim query As System.Data.SqlClient.SqlCommand
Dim result As System.Data.SqlClient.SqlDataReader
'Load document list from database
Dim savedfiles As New ArrayList
database.Open() 'Open connection to database
sql = "SELECT filename FROM fileheaders WHERE [category] = '" & category & "'" 'SQL query to read file header information
query = New System.Data.SqlClient.SqlCommand(sql, database) 'Create query to send to database
result = query.ExecuteReader() 'Execute query
While result.Read()
savedfiles.Add(row(result, "filename"))
End While
result.Close()
dbDocscan.Close()
'The following code section pulls all files from the current file directory.
Dim filelist = New ArrayList
Dim dir As New System.IO.DirectoryInfo(dirName) 'Get directory information
Dim files As System.IO.FileInfo() = dir.GetFiles() 'Get all files in directory
Dim file As System.IO.FileInfo
Dim i As Integer = 0
For Each file In files
If ((file.Extension Like ".pdf") Or (file.Extension Like ".tif")) And Not inArray(savedfiles, file.Name) Then
filelist.Add(file.Name) 'Add .pdf and .tif files to list of documents
End If
Next
filelist.TrimToSize()
eltFilelist.DataSource = filelist
eltFilelist.DataBind() 'Bind document list to listbox
然后是inArray函数代码:
Function inArray(arr As ArrayList, str As String) As Boolean
For Each item In arr
If TypeOf (item) Is String Then
If str = item Then
Return True
Exit Function
End If
End If
Next
Return False
End Function
问题在于:虽然它有效但看起来非常低效。目录中有大约27,000个文件,数据库中有大约26,000个文件条目。所以我在26,000个名字的列表中检查每个27,000个文件名。没有将其变成组合问题,那就是数以亿计的字符串匹配语句。有没有更有效的方法来解决这个问题?
答案 0 :(得分:0)
使用Dictionary或HashTable来保存查询中的文件名,而不是使用ArrayList。
您的inArray函数正在为找到的每个文件执行O(n)表扫描,这非常慢。
Dictionaries和HashTables都有一个Contains成员,可以更快的速度搜索您的文件名。
答案 1 :(得分:0)
您可以使用SQL参数来避免类别字符串出现问题(例如,如果其中包含撇号,则会破坏连接的查询字符串),只获取目录中具有您感兴趣的扩展名的文件,您可以使用LINQ以简单的方式获取丢失的文件:
Imports System.Data.SqlClient
Imports System.IO
Module Module1
Function GetMissingFiles(sourceDirectory As String, category As String) As List(Of String)
Dim missingFiles As New List(Of String)
Dim filesInDatabase As New List(Of String)
' Query the database for the files in the given category'
Using conn As New SqlConnection("connection string here")
conn.Open()
Dim sqlCmd As String = "SELECT filename FROM fileheaders WHERE [category] = @category"
Dim query As New System.Data.SqlClient.SqlCommand(sqlCmd, conn)
'TODO: change .SqlDbType to what it is in the database.'
query.Parameters.Add(New SqlParameter With {.ParameterName = "@category", .SqlDbType = SqlDbType.NVarChar, .Value = category})
Dim rdr As SqlDataReader = query.ExecuteReader()
While rdr.Read()
filesInDatabase.Add(rdr.GetString(0))
End While
conn.Close()
End Using
'TODO: it could be that filesInDatabase.Count = 0 is valid. Adjust if required.'
If filesInDatabase.Count > 0 Then
' Get the existing files from the given directory.
' the extensions we are going to consider
Dim extensions() As String = {"pdf", "tif"}
Dim existingFiles As New List(Of String)
' get all the filenames (without the path) to consider'
For Each extn In extensions
existingFiles.AddRange(Directory.GetFiles(sourceDirectory, "*." & extn).ToList().Select(Function(p) Path.GetFileName(p)))
Next
missingFiles = existingFiles.Except(filesInDatabase).ToList()
End If
Return missingFiles
End Function
Sub Whatever()
Dim myMissingFiles As List(Of String)
Try
myMissingFiles = GetMissingFiles("C:\temp", "RFQ")
Catch ex As Exception
' Inform user it went wrong.'
End Try
If myMissingFiles IsNot Nothing AndAlso myMissingFiles.Count > 0 Then
eltFilelist.DataSource = myMissingFiles
eltFilelist.DataBind()
End If
End Sub
End Module