Question

我迫切需要帮助，我试图在包含超过5000个pdf的文件夹目录中搜索文本字符串，代码经过测试并且使用的PDF文件少于100个它有效，但一旦达到极限，需要5-10分钟才能得出结果。非常感谢任何帮助：

'<%
'Search Text
Dim strtextToSearch
strtextToSearch = Request("TextToSearch")

'Now, we want to search all of the files
Dim fso

'Constant to read
Const ForReading = 1
Set fso = Server.CreateObject("Scripting.FileSystemObject")

'Specify the folder path to search.
Dim FolderToSearch
FolderToSearch = "C:\inetpub\site\Files\allpdfs\"

'Proceed if folder exists
if fso.FolderExists(FolderToSearch) then

    Dim objFolder
    Set objFolder = fso.GetFolder(FolderToSearch)

    Dim objFile, objTextStream, strFileContents, bolFileFound
    bolFileFound = False

    Dim FilesCounter
    FilesCounter = 0 'Total files found

    For Each objFile in objFolder.Files
        Set objTextStream = fso.OpenTextFile(objFile.Path,ForReading)
        'Read the content
        strFileContents = objTextStream.ReadAll
        If InStr(1,strFileContents,strtextToSearch,1) then
        '%>
           <a href="http://go.to.mysite.com/files/allpdfs/<%Response.Write objFile.Name%>" target="_blank">
        '<%
           Response.Write objFile.Name & "</a><br>"
           FilesCounter = FilesCounter + 1
        End If
        objTextStream.Close
    Next

    if FilesCounter = 0 then
        Response.Write "Sorry, No matches found."
    else
        Response.Write "Total files found : " & FilesCounter
    end if

    'Destroy the objects
    Set objTextStream = Nothing
    Set objFolder = Nothing
else
    Response.Write "Sorry, invalid folder name"
end if
Set fso = Nothing
%>

Answer 1

每次进行全面搜索都需要永远。你最好使用像Solr这样的索引器来保持搜索引擎的索引并快速返回结果。

这是一个很好的起点。 http://wiki.apache.org/solr/

更快地搜索文件夹中的pdf文本字符串

1 个答案: