使用VBA从网站上刮,但它不起作用。该怎么办?

时间:2016-04-25 18:27:45

标签: excel vba extract extraction

我有这个网站: http://ga.healthinspections.us/georgia/search.cfm?start=21&1=1&f=s&r=ANY&s=&inspectionType=Food&sd=03/26/2016&ed=04/25/2016&useDate=NO&county=Fulton&

我已经编写了代码,但即使是第一页也无法运行。我的目标是从每个页面提取以下建立详细信息为例:

Column 1: 103 West Lounge (Food Service Inspections)
Column 2: 103 WEST PACES FERRY RD ATLANTA, GA 30318
(Skip this detail) View inspections:
Column 3: July 10, 2012 Score: 92, Grade: A 
Column 4): July 26, 2013 Score: 90, Grade: A 
Column 5): February 19, 2014 Score: 98, Grade: A 
Column 6): December 12, 2014 Score: 100, Grade: A 
Column 6): November 13, 2015 Score: 99, Grade: A

目前,该代码仅从中提取URL而没有任何详细信息,需要查看要更改或错误的内容:

Sub Test()
Dim IE As New InternetExplorer
Dim html As HTMLDocument
Dim link As Object
Dim ws As Worksheet

Set ws = Sheets("Sheet1")

Application.ScreenUpdating = False
Set IE = New InternetExplorer

' Test 2 pages (page 2 and page 3) starting from page 2. So far so good.
For i = 2 To 4 Step 2

myurl = "http://ga.healthinspections.us/georgia/search.cfm?start=" & i & "1&1=1&f=s&r=ANY&s=&inspectionType=Food&sd=03/26/2016&ed=04/25/2016&useDate=NO&county=Fulton&"
IE.Visible = False
IE.navigate myurl
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE

Set html = IE.document
' I assume here is the problem, because I need to supplement code part to find these details. 
Set link = html.getElementsByTagName("a")

' This part was intended to test if I can to extract at least one detail.
For m = 1 To 2
For Each myurl In link
Cells(m, 1) = link

Next
Next m
Next i
'Also I tried to test with msgbox but no luck either
'MsgBox link

IE.quit
Set IE = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True

End Sub

也许某些事情搞砸了,或者我只是缺乏知识。 :)希望得到任何帮助。

2 个答案:

答案 0 :(得分:0)

你有参考设定吗?用于Microsoft Internet控件和Microsoft HTML对象库?如果是这样,请尝试替换代码部分。

Dim IE As New InternetExplorer
Dim html As MSHTML.HTMLDocument
Dim link As Object
Dim ws As Worksheet

Set ws = Sheets("Sheet1")

Application.ScreenUpdating = False
Set IE = New InternetExplorer

答案 1 :(得分:0)

您可以使用以下方法获取innertext。

Sub DumpData()

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True

URL = "http://ga.healthinspections.us/georgia/search.cfm?start=1&1=1&f=s&r=ANY&s=&inspectionType=Food&sd=03/26/2016&ed=04/25/2016&useDate=NO&county=Fulton&"

'Wait for site to fully load
IE.Navigate2 URL
Do While IE.Busy = True
   DoEvents
Loop

RowCount = 1

With Sheets("Sheet1")
   .Cells.ClearContents
   RowCount = 1
   For Each itm In IE.Document.all
      .Range("A" & RowCount) = itm.tagName
      .Range("B" & RowCount) = itm.ID
      .Range("C" & RowCount) = itm.className
      .Range("D" & RowCount) = Left(itm.innerText, 1024)

      RowCount = RowCount + 1
   Next itm
End With
End Sub

我是从一个名叫乔尔的好人那里得到的。他是这样的人。

将数据导入工作表后,进行一些简单的清理工作,摆脱多余的东西,你就应该全力以赴。