我最近一直在教自己VBA的目标是制作我自己的自定义“网站抓取工具”,以制作网站索引/网站地图。
所以我的代码(完成后)会查看并点击网站上的所有链接。
我可以从主页获取所有链接没问题,并将它们放在Excel中。但是,当我尝试从其他页面获取链接时,我遇到运行时错误,例如:
“运行时错误'70':权限被拒绝”
或
“运行时错误'91':未设置对象变量或宽度块”
然而,当我进入调试模式并进入代码时,我不会遇到任何这些错误。这很奇怪。
有一点需要提及的是,我从具有许多安全设置的系统访问网站,包括对互联网的限制,这就是为什么我有这条线:CreateObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
而不是Create(internetExplorer)
等......
我的代码中的For Each tabLinks In ieLink2
卡住了。我已经删除了链接,但希望每个人仍然可以得到这个想法..
(请注意,我是StackOverflow的新手,所以我不确定发布这样的整个代码是否合适,或者我是否应该只发布一段代码)。
Public Sub CreateSiteIndex_Click()
'Variables
Dim objShell As Object
Dim objShellWind As Object
Dim ie As Object
Dim ieFol As Object
Dim ieData As Object
Dim ieLink As Object
Dim tabData As Object
Dim ieLink2 As Object
Dim listLinks As Object
Dim tabLinks As Object
'Variable for duplicate link check
Dim dupCheck As Boolean
'Variables to check for site links
Dim siteCheck As Boolean
Dim siteAddress As String
Dim siteAddress2 As String
'Variables to check for unwanted links
Dim noCheck As Boolean
Dim noCheckLink As String
Dim noCheckLink2 As String
'Track Shell Windows
Set objShell = CreateObject("Shell.Application")
Set objShellWind = objShell.Windows
'Navigating to webpage
Set ie = CreateObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
ie.Visible = True
ie.Navigate2 "mywebsiteURLisHere(just blocked it out for security purposes"
Set ieFol = CreateObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
'Wait until page is loaded before checking links
Do Until ie.ReadyState >= 4
DoEvents
Loop
Application.Wait Now + TimeSerial(0, 0, 5)
'Get all links from webpage and store as a list/array
Set ieData = ie.Document
'tabData(0) = 0
Set ieLink = ieData.getElementsByTagName("a")
'These are specifications/filters for which links to allow in the Excel sheet
siteAddress = "specific filter here"
siteAddress2 = "another one..."
noCheckLink = "another filter"
'For Loop - goes through each link on page
i = 1
j = 1
k = 1
Cells.Clear
For Each listLinks In ieLink
'Checks to make sure no duplicates before adding link to Excel sheet
'dupCheck becomes TRUE if duplicate
Range("C1").Select
Do Until IsEmpty(ActiveCell)
If (ActiveCell = listLinks.href) Then
dupCheck = True
End If
ActiveCell.Offset(1, 0).Select
ActiveCell.WrapText = True
Loop
'If not a duplicate
If (dupCheck = False) Then
'Check that link is a Horizons link
sC = InStr(listLinks, siteAddress)
sC2 = InStr(listLinks, siteAddress2)
'Check that link is not HOME or TOP OF PAGE
nC = InStr(listLinks, noCheckLink)
If sC > 0 Or sC2 > 0 Then
siteCheck = True
End If
If nC > 0 Then
noCheck = True
Else: noCheck = False
End If
'If link is a Horizons link AND it not linking back to homepage
If (siteCheck = True) Then
If (noCheck = False) Then
'Add links to Excel sheet
ActiveSheet.Cells(i, 3) = listLinks.href
ActiveSheet.Cells(i, 2) = listLinks.innerText
'Convert URL to hyperlink
For Each Cell In Selection
ActiveSheet.Hyperlinks.Add Cells(i, 3), Cell.Value
Next
If ieFol Is Nothing Then Set ieFol = CreateObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
'Follow hyperlink
'ActiveSheet.Cells(i, 3).Hyperlinks(1).Follow
ieFol.Navigate2 ActiveSheet.Cells(i, 3).Value
While ieFol.Busy
'wait for page to load
Wend
Set tabData = ieFol.Document
Set ieLink2 = tabData.getElementsByTagName("a")
For Each tabLinks In ieLink2
If ieFol Is Nothing Then Set ieFol = CreateObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
If tabData Is Nothing Then Set tabData = ieFol.Document
If ieLink2 Is Nothing Then Set ieLink2 = tabData.getElementsByTagName("a")
ActiveSheet.Cells(k, 7) = tabLinks.href
ActiveSheet.Cells(k, 6) = tabLinks.innerText
If ieFol Is Nothing Then Set ieFol = CreateObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
If tabData Is Nothing Then Set tabData = ieFol.Document
If ieLink2 Is Nothing Then Set ieLink2 = tabData.getElementsByTagName("a")
'Check for broken link
If InStr(tabData.Body.innerText, "Page Not Found") > 0 Then
'The link is not valid, flag the cell
ActiveSheet.Cells(i, 3).Interior.Color = vbRed
End If
'ie.Quit
k = k + 1
Next tabLinks
'Next
'Skip link if it links back to homepage
ElseIf (noCheck = True) Then
i = i - 1
End If
siteCheck = False
'If link goes to external site, put it in a different column
ElseIf (siteCheck = False) Then
ActiveSheet.Cells(j, 5) = listLinks.href
ActiveSheet.Cells(j, 4) = listLinks.innerText
j = j + 1
i = i - 1
End If
'If it is a duplicate, skip that link
Else:
dupCheck = False
i = i - 1
End If
i = i + 1
'On to the next!
Next listLinks
'Close the window when done
ie.Quit
End Sub
答案 0 :(得分:0)
仅针对正在寻找其他解决方案的人。我今天遇到了类似的问题。然后我使用" DoEvents"在突出显示错误的行之前,它可以正常工作。