多个文件从每个提取类似的单词表到excel VBA

时间:2013-10-05 15:38:56

标签: excel vba excel-vba

我有超过300个包含单词表的单词文档,我一直在尝试为excel编写一个VBA脚本来提取我需要的信息,而且我是Visual Basic的新手。我需要将文件名复制到第一个单元格,以下单元格包含我要提取的信息,然后是下一个文件名,循环开启直到搜索并提取所有单词文档。我尝试了多种不同的方法,但我能找到的最接近的代码如下。它可以提取部件号,但不能描述。它还提取了不需要的无关信息,但如果它是必要的危险,我可以解决这些信息。 我有一个示例word文件(用其他信息替换敏感信息),但我不知道如何附加word文档的第1页和第2页的word文档或jpegs。我知道如果你能看到它会是有益的,所以请告诉我如何在这里或你这么做,这样你就可以看到它。

所以重新迭代:

  • 我需要第一个单元格(A1)中的文件名
  • 我需要从word文档到excel的表3中的某个单元格
  • 如果可能的话,我需要在B栏(B2:B?)和 C列(C2:C?)中字母和数字的混合,然后是下一个 换行,下一个文件名(A?),并继续重复。如果你 有什么想法或建议,请告诉我。如果我不能 张贴图片,或实际样本文件,我愿意 电子邮件,或任何其他必要的手段来获得帮助。

这是我一直试图操纵的代码。我找到了它,它是一个表格的第一行和最后一行,我试图让它工作,为了我的目的无济于事:

Sub GetTablesFromWord()

   'this Excel file must be in
   'the same folder with the Word
   'document files that are to be'processed.    
   Dim wApp As Word.Application  
   Dim wDoc As Word.Document  
   Dim wTable As Word.Table  
   Dim wCell As Word.Cell 
   Dim basicPath As String  
   Dim fName As String     

   Dim myWS As Worksheet  
   Dim xlCell As Range  
   Dim lastRow As Long  
   Dim rCount As Long  
   Dim cCount As Long  
   Dim RLC As Long  
   Dim CLC As Long    
   basicPath = ThisWorkbook.Path & Application.PathSeparator  
   'change the sheet name as required  
   Set myWS = ThisWorkbook.Worksheets("Sheet1")  
   'clear any/all previous data on the sheet  myWS.Cells.Clear     

   '"open" Word  Set wApp = CreateObject("Word.Application")  
   'get first .doc file name in the folder  
   'with this Excel file  
   fName = Dir(basicPath & "*.doc*")
    Do While fName <> ""
       'this puts the filename into column A to
       'help separate the table data in Excel
       myWS.Range("A" & Rows.Count).End(xlUp).Offset(1, 0) = _
       "FILE: [" & fName & "]"
       'open the Word file
       wApp.Documents.Open basicPath & fName
       Set wDoc = wApp.Documents(1)
       'if there is a table in the
       'Word Document, work with it
       If wDoc.Tables.Count > 0 Then
         Set wTable = wDoc.Tables(3)
         rCount = wTable.Rows.Count
         cCount = wTable.Columns.Count
           For RLC = 1 To rCount
           lastRow = myWS.Range("A" & Rows.Count).End(xlUp).Row + 1
           For CLC = 1 To cCount
          'if there are merged cells in the
          'Word table, an error will be
          'generated - ignore the error,
          'but also won't process the data
          On Error Resume Next
          Set wCell = wTable.Cell(RLC, CLC)
          If Err <> 0 Then
            Err.Clear
            Else
            If CLC = 1 Then 
             Set xlCell = myWS.Range("A" & lastRow)
              xlCell = wCell
            Else
              Set xlCell = myWS.Range("B" & lastRow)
              xlCell = wCell
            End If
          End If
          On Error GoTo 0
        Next
      Next
      Set wCell = Nothing
      Set wTable = Nothing
    End If ' end of wDoc.Tables.Count test
    wDoc.Close False
    Set wDoc = Nothing
    fName = Dir()
 ' gets next .doc* filename in the folder
  Loop  wApp.Quit
  Set wApp = Nothing
  MsgBox "Task Completed"
End Sub

1 个答案:

答案 0 :(得分:0)

此代码循环遍历文件夹中包含的所有.docx文件,将数据提取到电子表格中,关闭word文档,然后移动到下一个文档。 Word文档的名称被提取到A列中,文档中第3个表中的值被提取到B列中。这应该是您构建的良好起点。

   Sub wordScrape()

Dim wrdDoc As Object, objFiles As Object, fso As Object, wordApp As Object
Dim sh1 As Worksheet
Dim x As Integer

FolderName = "C:\code" ' Change this to the folder containing your word documents

Set sh1 = ThisWorkbook.Sheets(1)
Set fso = CreateObject("Scripting.FileSystemObject")
Set wordApp = CreateObject("Word.application")
Set objFiles = fso.GetFolder(FolderName).Files

x = 1
For Each wd In objFiles
    If InStr(wd, ".docx") And InStr(wd, "~") = 0 Then
        Set wrdDoc = wordApp.Documents.Open(wd.Path, ReadOnly = True)
        sh1.Cells(x, 1) = wd.Name
        sh1.Cells(x, 2) = Application.WorksheetFunction.Clean(wrdDoc.Tables(3).Cell(Row:=3, Column:=2).Range)
        'sh1.Cells(x, 3) = ....more extracted data....
        x = x + 1
    wrdDoc.Close
    End If

Next wd
wordApp.Quit
End Sub