我想知道是否有人曾经处理过此事。我有一个电子表格,其中包含数千个pdf文件的链接。我想将每个pdf的内容加载到字符串变量中并运行一些RegEx来提取有用的数据。我有如下所示的函数,它将pdf文件的内容加载到字符串中,但此函数仅适用于本地文件。但是在我的情况下,我使用IE.Navigate2 "https://www.example.com/mypdf.pdf"
打开pdf文件,这将在浏览器中打开pdf,如何将该文件的内容加载到字符串中。极端的解决方案是下载文件并使用下面的函数打开它,然后将其删除。请让我知道你的想法。请注意,只有安装了Acrobat(不是阅读器),以下功能才有效,您还需要将VBA项目中的引用添加到Adobe Acrobat类型库
Public Function ReadAcrobatDocument(strFileName As String) As String
Dim AcroApp As CAcroApp, AcroAVDoc As CAcroAVDoc, AcroPDDoc As CAcroPDDoc
Dim AcroHiliteList As CAcroHiliteList, AcroTextSelect As CAcroPDTextSelect
Dim PageNumber, PageContent, Content, i, j
Set AcroApp = CreateObject("AcroExch.App")
Set AcroAVDoc = CreateObject("AcroExch.AVDoc")
If AcroAVDoc.Open(strFileName, vbNull) <> True Then Exit Function
' The following While-Wend loop shouldn't be necessary but timing issues may occur.
While AcroAVDoc Is Nothing
Set AcroAVDoc = AcroApp.GetActiveDoc
Wend
Set AcroPDDoc = AcroAVDoc.GetPDDoc
For i = 0 To AcroPDDoc.GetNumPages - 1
Set PageNumber = AcroPDDoc.AcquirePage(i)
Set PageContent = CreateObject("AcroExch.HiliteList")
If PageContent.Add(0, 9000) <> True Then Exit Function
Set AcroTextSelect = PageNumber.CreatePageHilite(PageContent)
' The next line is needed to avoid errors with protected PDFs that can't be read
On Error Resume Next
For j = 0 To AcroTextSelect.GetNumText - 1
Content = Content & AcroTextSelect.GetText(j)
Next j
Next i
ReadAcrobatDocument = Content
AcroAVDoc.Close True
AcroApp.Exit
Set AcroAVDoc = Nothing: Set AcroApp = Nothing
End Function