用于抓取数据的VBA脚本不起作用

时间:2013-11-29 14:25:41

标签: vba excel-vba web-scraping web-crawler excel

我编写了一个简短的VBA脚本,用于生成URL并下载页面内容并放入新的工作表。但是,数据始终显示在两个页面上,产生以下类型的URL:

对于结果的第一页:

resultat_annuaire.php?loc=01&item=hopital&session=clear   (with 01 being the region) 

第二页:

resultat_annuaire.php?loc=01&item=hopital&page=2   (session=clear is gone, replaced by page=2) 

当我的VBA脚本生成并删除第一页的网址时,它工作正常(即我将95个不同的网页下载到我的Excel中)

但是,当我运行相同的VBA脚本(仅更改生成URL以获取第二页的方式)时,它会下载第一个URL的第2页内容的95倍。

现在我尝试通过执行以下操作简单地在网络浏览器中调整URL:

输入第二页网址:

resultat_annuaire.php?loc=01&item=hopital&page=2

然后将01改为05:

resultat_annuaire.php?loc=05&item=hopital&page=2

再一次,没有任何反应,页面保持不变,就好像我没有将01切换到05.

这是VBA脚本:

Sub Data_scraping()
    For x = 1 To 9
        ActiveWorkbook.Worksheets.Add
        With ActiveSheet.QueryTables.Add(Connection:= _
            "URL;" _
            & "http://etablissements.hopital.fr/resultat_annuaire.php?loc=" _
            & "0" _
            & x _
            & "&item=hopital&session=clear" _
            , Destination:=Range("$A$1"))


        '.CommandType = 0


        .Name = "resultat_annuaire.php?loc=01&item=hopital&session=clear"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlEntirePage
        .WebFormatting = xlWebFormattingNone
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
    ActiveWindow.SmallScroll Down:=18
    Rows("1:31").Select
    Selection.Delete Shift:=xlUp
    Range("A5").Select
Next x
End Sub

有人可以提供解释或帮助吗?

1 个答案:

答案 0 :(得分:1)

您的宏适用于 page = 2

Sub sof20287920Data_scrapping()
  Dim x, strLoc, strUrl
  Dim wkb

  Set wkb = Workbooks.Add()
  wkb.Activate

  For x = 1 To 9
    ActiveWorkbook.Worksheets.Add After:=ActiveWorkbook.Sheets(ActiveWorkbook.Sheets.Count)
    strLoc = "resultat_annuaire.php?loc=" & "0" & x & "&item=hopital&session=clear&page=2"
    strUrl = "http://etablissements.hopital.fr/" & strLoc
    With ActiveSheet.QueryTables.Add(Connection:= _
      "URL;" & strUrl _
      , Destination:=Range("$A$1"))


      '.CommandType = 0


      .Name = strLoc
      .FieldNames = True
      .RowNumbers = False
      .FillAdjacentFormulas = False
      .PreserveFormatting = True
      .RefreshOnFileOpen = False
      .BackgroundQuery = True
      .RefreshStyle = xlInsertDeleteCells
      .SavePassword = False
      .SaveData = True
      .AdjustColumnWidth = True
      .RefreshPeriod = 0
      .WebSelectionType = xlEntirePage
      .WebFormatting = xlWebFormattingNone
      .WebPreFormattedTextToColumns = True
      .WebConsecutiveDelimitersAsOne = True
      .WebSingleBlockTextImport = False
      .WebDisableDateRecognition = False
      .WebDisableRedirections = False
      .Refresh BackgroundQuery:=False
    End With

    ActiveWindow.SmallScroll Down:=18
    Rows("1:31").Select
    Selection.Delete Shift:=xlUp
    Range("A5").Select
  Next x

End Sub

即使对于page = 2,session = clear似乎也是必须的,如下所示:

http://etablissements.hopital.fr/resultat_annuaire.php?loc=01&item=hopital&session=clear&page=2