I need to get the page number in order to extract text from that specific page in a .PDF document. I am using Excel VBA function that makes use of the JSObject from the Acrobat Type Library 10.0
Here is the code snippet and the code hicks up on when I am trying to reference the pageNum property from Doc object. I am trying to avoid the AV Layer and use the PD Layer only, so my macro runs in the background only and doesn't invoke Acrobat Application.
Function getTextFromPDF_JS(ByVal strFilename As String) As String
Dim pdDoc As New AcroPDDoc
Dim pdfPage As Acrobat.AcroPDPage
Dim pdfBookmark As Acrobat.AcroPDBookmark
Dim jso As Object
Dim BookMarkRoot As Object
Dim vBookmark As Variant
Dim objSelection As AcroPDTextSelect
Dim objHighlight As AcroHiliteList
Dim currPage As Integer
Dim strText As String
Dim BM_flag As Boolean
Dim count As Integer
Dim word As Variant
strText = ""
If (pdDoc.Open(strFilename)) Then
Set jso = pdDoc.GetJSObject
Set BookMarkRoot = jso.BookMarkRoot
vBookmark = jso.BookMarkRoot.Children
'Add a function call to see if a particular bookmark exists within the .PDF
Set pdfBookmark = CreateObject("AcroExch.PDBookmark")
BM_flag = pdfBookmark.GetByTitle(pdDoc, "Title Page")
If (BM_flag) Then
For i = 0 To UBound(vBookmark)
If vBookmark(i).Name = "Title Page" Then
vBookmark(i).Execute
jso.pageNum
Set pdfPage = pdDoc.AcquirePage(pageNum)
Set objHighlight = New AcroHiliteList
objHighlight.Add 0, 10000 ' Adjust this up if it's not getting all the text on the page
Set objSelection = pdfPage.CreatePageHilite(objHighlight)
If Not objSelection Is Nothing Then
For tCount = 0 To objSelection.GetNumText - 1
strText = strText & objSelection.GetText(tCount)
Next tCount
End If
Exit For
End If
pdDoc.Close
End If
End If
getTextFromPDF_JS = strText
End Function
答案 0 :(得分:0)
jso.pageNum = 0;设置页码
pageNo = jso.pageNum;获取页码
修改:3.3.19
嗯,看来您必须使用AVDoc才能通过jso.pageNum获取当前的实际页面。同样,如果您使用AVdoc,则Acobat窗口将隐藏在后台。示例:
strFilename = "d:\Test2.pdf"
set avDoc = CreateObject("AcroExch.AVDoc")
If (avDoc.Open(strFilename,"")) Then
Set pdDoc = avDoc.getPDDoc()
Set jso = pdDoc.GetJSObject
pageNo = jso.pageNum
msgbox(pageNo)
end if