我编写了一个简短的VBA脚本,用于生成URL并下载页面内容并放入新的工作表。但是,数据始终显示在两个页面上,产生以下类型的URL:
对于结果的第一页:
resultat_annuaire.php?loc=01&item=hopital&session=clear (with 01 being the region)
第二页:
resultat_annuaire.php?loc=01&item=hopital&page=2 (session=clear is gone, replaced by page=2)
当我的VBA脚本生成并删除第一页的网址时,它工作正常(即我将95个不同的网页下载到我的Excel中)
但是,当我运行相同的VBA脚本(仅更改生成URL以获取第二页的方式)时,它会下载第一个URL的第2页内容的95倍。
现在我尝试通过执行以下操作简单地在网络浏览器中调整URL:
输入第二页网址:
resultat_annuaire.php?loc=01&item=hopital&page=2
然后将01改为05:
resultat_annuaire.php?loc=05&item=hopital&page=2
再一次,没有任何反应,页面保持不变,就好像我没有将01切换到05.
这是VBA脚本:
Sub Data_scraping()
For x = 1 To 9
ActiveWorkbook.Worksheets.Add
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;" _
& "http://etablissements.hopital.fr/resultat_annuaire.php?loc=" _
& "0" _
& x _
& "&item=hopital&session=clear" _
, Destination:=Range("$A$1"))
'.CommandType = 0
.Name = "resultat_annuaire.php?loc=01&item=hopital&session=clear"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
ActiveWindow.SmallScroll Down:=18
Rows("1:31").Select
Selection.Delete Shift:=xlUp
Range("A5").Select
Next x
End Sub
有人可以提供解释或帮助吗?
答案 0 :(得分:1)
您的宏适用于 page = 2 ,
Sub sof20287920Data_scrapping()
Dim x, strLoc, strUrl
Dim wkb
Set wkb = Workbooks.Add()
wkb.Activate
For x = 1 To 9
ActiveWorkbook.Worksheets.Add After:=ActiveWorkbook.Sheets(ActiveWorkbook.Sheets.Count)
strLoc = "resultat_annuaire.php?loc=" & "0" & x & "&item=hopital&session=clear&page=2"
strUrl = "http://etablissements.hopital.fr/" & strLoc
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;" & strUrl _
, Destination:=Range("$A$1"))
'.CommandType = 0
.Name = strLoc
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
ActiveWindow.SmallScroll Down:=18
Rows("1:31").Select
Selection.Delete Shift:=xlUp
Range("A5").Select
Next x
End Sub
即使对于page = 2,session = clear似乎也是必须的,如下所示:
http://etablissements.hopital.fr/resultat_annuaire.php?loc=01&item=hopital&session=clear&page=2