VBA从pdf复制数据

时间:2015-03-03 15:07:10

标签: excel vba excel-vba pdf

我有大量的pdf文件,我希望将文件中的所有数据复制到电子表格中的列。

以下是我的代码。它只是打开pdf,使用control-a,然后使用control-c进行复制,然后激活工作簿,找到一个打开的列并使用control-v Sendkey粘贴数据。它工作正常,但它只粘贴最后一个文件中的lastdata(我有一个带有路径名的范围,它打开并复制所有数据,但实际上只粘贴最后一个)。

    Sub StartAdobe1()

     Dim AdobeApp As String
     Dim AdobeFile As String
     Dim StartAdobe
     Dim fname As Variant
     Dim iRow As Integer
     Dim Filename As String

 For Each fname In Range("path")

   AdobeApp = "C:\Program Files (x86)\Adobe\Reader 10.0\Reader\AcroRd32.exe"
   StartAdobe = Shell("" & AdobeApp & " " & fname & "", 1)


   Application.Wait Now + TimeValue("00:00:01")
    SendKeys "^a", True
     Application.Wait Now + TimeValue("00:00:01")
    SendKeys "^c"
   Application.Wait Now + TimeValue("00:00:01")
    SendKeys ("%{F4}")
    Windows("transfer (Autosaved).xlsm").Activate
    Worksheets("new").Activate


    ActiveSheet.Range("A1").Select
    Selection.End(xlToRight).Offset(0, 1).Select

    SendKeys "^v"
    Application.Wait Now + TimeValue("00:00:2")

Next fname

6 个答案:

答案 0 :(得分:2)

Jeanno是对的,如果你有Acrobat,那么使用它的API库直接使用该文件比解决方法要好得多。我每天都使用它来将pdf文件转换为数据库条目。

您的代码存在一些问题,但我怀疑最大的问题是使用SendKeys "^v"粘贴到Excel中。您最好选择所需的单元格,然后使用Selection.Paste。或者甚至更好,将剪贴板的内容传输到变量,然后在写入电子表格之前根据需要在后端解析 - 但这增加了一堆复杂性,在这种情况下对你没有多大帮助。 / p>

要使用以下代码,请务必在工具>参考文献下选择“Acrobat x.x类型库”。

Sub StartAdobe1()
    Dim fName       As Variant
    Dim wbTransfer  As Excel.Workbook
    Dim wsNew       As Excel.Worksheet
    Dim dOpenCol    As Double
    Dim oPDFApp     As AcroApp
    Dim oAVDoc      As AcroAVDoc
    Dim oPDDoc      As AcroPDDoc

    'Define your spreadsheet
    Set wbTransfer = Workbooks("transfer (Autosaved).xlsm")
    Set wsNew = wbTransfer.Sheets("new")
    'Find first open column
    dOpenCol = ws.Cells(1, columns.count).End(xlToleft).Column + 1

    'Instantiate Acrobat Objects
    Set oPDFApp = CreateObject("AcroExch.App")
    Set oAVDoc = CreateObject("AcroExch.AVDoc")
    Set oPDDoc = CreateObject("AcroExch.PDDoc")

For Each fName In Range("path")

    'Open the PDF file. The AcroAVDoc.Open function returns a true/false 
    'to tell you if it worked
    If oAVDoc.Open(fName.Text, "") = True Then
        Set oPDDoc = oAVDoc.GetPDDoc
    Else
        Debug.Assert False
    End If

    'Copy all using Acrobat menu
    oPDFApp.MenuItemExecute ("SelectAll")
    oPDFApp.MenuItemExecute ("Copy")

    'Paste into open column
    wbTransfer.Activate
    wsNew.Cells(1, dOpenCol).Select
    ActiveSheet.Paste

    'Select next open column
    dOpenCol = dOpenCol + 1

    oAVDoc.Close (1)    '(1)=Do not save changes
    oPDDoc.Close

Next

    'Clean up
    Set wbTransfer = Nothing
    Set wsNew = Nothing
    Set oPDFApp = Nothing
    Set oAVDoc = Nothing
    Set oPDDoc = Nothing


End Sub

注意: 1 - 还有一个菜单项oPDFApp.MenuItemExecute ("CopyFileToClipboard")应该选择全部并一步复制,但我遇到了问题所以我坚持上面的两步法。

2-pdf文件由两个对象oAVDocoPDDoc组成。文件的不同方面由每个控制。在这种情况下,您可能只需要oAVDoc。尝试评论处理oPDDoc的行,看看它是否有效。

答案 1 :(得分:0)

我无法让你的代码工作,但我的猜测是它复制了所有数据,但每次都通过循环覆盖它。要解决此问题,请尝试:

ActiveSheet.Cells(1, ActiveSheet.Cells(1, Columns.Count).End(xlToLeft).Column + 1).Select

而不是开始activesheet.range(“A1”)的两行。选择和Selection.End ....

答案 2 :(得分:0)

尝试此代码可能会有效:

 Sub Shell_Copy_Paste()

   Dim o As Variant
   Dim wkSheet As Worksheet

   Set wkSheet = ActiveSheet

   o = Shell("C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe C:\Users\saurabh.ad.sharma\Desktop\red.pdf", vbNormalFocus)

   Application.Wait (Now + TimeSerial(0, 0, 2)) 'Wait for Acrobat to load

   SendKeys "^a"   'Select All
   SendKeys "^c"   'Copy
   SendKeys "%{F4}"    'Close shell application

   wkSheet.Range("B5").Select
   SendKeys "^v"   'Paste

End Sub

答案 3 :(得分:0)

以下代码将复制PDF格式的数据将它按字样粘贴,然后将文字复制,然后将其粘贴到EXCEL上。

现在为什么我要将数据从pdf复制到word&然后从word复制并将其粘贴到excel,因为我想要将pdf中的数据精确格式化为我的Excel表格,如果我直接从pdf复制到excel,它会将pdf中的整个数据粘贴到单个单元格中即使我是有两列或多行,它会将我的所有数据粘贴到一个列中,单个单元格中也是如此,但如果我从word复制到excel,它将保留其原始格式,并且两列将仅在excel中粘贴为两列。 / p>

Private Sub CommandButton3_Click()  '(load pdf)


   Dim o As Variant
Set appWord = CreateObject("Word.Application")
 o = Shell("C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe C:\Users\saurabh.ad.sharma\Desktop\Book1.pdf", vbNormalFocus)   'loading adobe reader & pdf file from their location
 Application.Wait (Now + TimeSerial(0, 0, 2))
   SendKeys ("^a")
SendKeys ("^c")
 SendKeys "%{F4}"
Application.Wait Now + TimeValue("00:00:01")
 Set appWord = CreateObject("Word.Application")
 appWord.Visible = True
 appWord.Documents.Add.Content.Paste
With appWord

       .ActiveDocument.SaveAs Filename:=ThisWorkbook.Path & "\pdf" & ".docx", FileFormat:=wdocument   'saving word file in docx format
        .ActiveWindow.Close
        .Quit
    End With

MsgBox " pdf is loaded "
MsgBox " Paste to EXCEL "

    Set appWord = CreateObject("Word.Application")
     appWord.Visible = True

  appWord.Documents.Open "C:\Users\saurabh.ad.sharma\Desktop\pdf.docx" 'opening word document
      appWord.Selection.WholeStory
      appWord.Selection.Copy
   Set wkSheet = ActiveSheet
    wkSheet.Range("A1").Select
    wkSheet.Paste 'pasting to the excel file

End Sub

答案 4 :(得分:0)

这是我上面代码的修改版本,它不会保存任何文件,它会将数据保存在剪贴板中并快速执行..

Private Sub CommandButton3_Click()  '(load pdf)


   Dim o As Variant
Set appWord = CreateObject("Word.Application")
 o = Shell("C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe C:\Users\saurabh.ad.sharma\Desktop\Book1.pdf2", vbNormalFocus)
 Application.Wait (Now + TimeSerial(0, 0, 2))
   SendKeys ("^a")
SendKeys ("^c")
 SendKeys "%{F4}"
Application.Wait Now + TimeValue("00:00:01")
 Set appWord = CreateObject("Word.Application")
 appWord.Visible = False
 appWord.Documents.Add.Content.Paste
With appWord

       .Selection.WholeStory
       .Selection.Copy
       .ActiveDocument.Close SaveChanges:=wdDoNotSaveChanges
       .Quit
End With

MsgBox " pdf is loaded "
MsgBox " Paste to EXCEL "


   Set wkSheet = ActiveSheet
    wkSheet.Range("A1").Select
    wkSheet.Paste

End Sub

答案 5 :(得分:0)

我有类似的问题。如前所述,最好的解决方案是使用Adobe API。在我的情况下,这是不可能的,因为宏适用于100多个PC上没有安装Adobe Pro的用户。

使用sendkeys的问题是您必须等到PDF文件被加载并且加载时间容易改变。小文件可能需要1秒钟,从共享驱动器打开大文件可能需要8秒钟。

请在下面找到我的解决方案,该解决方案经过优化,可与任何大小的文件一起使用。很稳定我发现只有在Adobe显示打开消息时它才会失败(例如“更新可用”)。

Private Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr) 'Initialize sleep function via Windows API
Public Sub CopyToSheet(destinationSheet As Worksheet, pathToPdf as String)
'Copy data from PDF to worksheet

    'Initialize timer
    Dim StartTime As Double
    StartTime = Timer

    'Clear clipboard
    Dim myData As DataObject
    Set myData = New DataObject
    myData.SetText text:=Empty
    myData.PutInClipboard
    Set myData = Nothing

    'Build file paths
    Dim pathToAdobe As String
    pathToAdobe = """C:\Program Files (x86)\Adobe\Acrobat Reader DC\Reader\AcroRd32.exe"""
    pathToPdf = """" & pathToPdf & """"

    'Open PDF and wait untill it is open. If file is already opened it will be just activated
    Dim pdfId As Long
    pdfId = Shell(pathToAdobe & " " & pathToPdf, vbMaximizedFocus)
    Do
        Sleep (500)
        If Round(Timer - StartTime, 2) > 9 Then MsgBox "Failed to open PDF": Exit Sub  'Safety check
    Loop Until Me.IsPdfOpen(pathToPdf)

    'Copy and wait until copying is completed
    SendKeys "^a"
    SendKeys "^c"
    Do
        Sleep (500)
        If Round(Timer - StartTime, 2) > 18 Then MsgBox "Failed to copy data to clipboard": Exit Sub  'Safety check
    Loop Until Me.GetClipboardStatus = "ClipboardHasData"

    'Paste data into worksheet
    destinationSheet.Activate
    destinationSheet.Range("A1").Select
    destinationSheet.Paste

    'Close pdf
    Call Shell("TaskKill /F /PID " & CStr(pdfId), vbHide)

    'Clear clipboard
    Set myData = New DataObject
    myData.SetText text:=Empty
    myData.PutInClipboard
    Set myData = Nothing

End Sub

Function IsPdfOpen(pathToPdf) As Boolean
'Check if PDF is already opened

    'Build window name (window name is name of the application on Windows task bar)
    Dim windowName As String
    windowName = pathToPdf
    windowName = Mid(windowName, InStrRev(windowName, "\") + 1, Len(windowName) - InStrRev(windowName, "\") + 1)
    windowName = windowName + " - Adobe Acrobat Reader DC"

    'Try to activate application to check if is opened
    On Error Resume Next
    AppActivate windowName, True
    Select Case Err.Number
        Case 5: IsPdfOpen = False
        Case 0: IsPdfOpen = True
        Case Else: Debug.Assert False
    End Select
    On Error GoTo 0

End Function

Function GetClipboardStatus() As String
'Check if copying data to clipboard is completed

    Dim tempString As String
    Dim myData As DataObject

    'Try to put data from clipboard to string to check if operations on clipboard are completed
    On Error Resume Next
    Set myData = New DataObject
    myData.GetFromClipboard
    tempString = myData.GetText(1)
    If Err.Number = 0 Then
        If tempString = "" Then
            GetClipboardStatus = "ClipboardEmpty"
        Else
            GetClipboardStatus = "ClipboardHasData"
        End If
    Else
        GetClipboardStatus = "ClipboardBusy"
    End If
    On Error GoTo 0

    Set myData = Nothing

End Function