我正在尝试在Excel中的A列中的标记<a data-params=""> from Amazon.com website

时间:2018-01-07 14:24:26

标签: html excel vba excel-vba

I have set of Amazon URLS ( https://www.amazon.com/dp/B01LTIORC8)中提取数据,我试图从“href =”/ dp中提取“B00M4L4MFC”数据使用以下html标记在B列中显示/ B00M4L4MFC / ref = dp_cerb_1“”。

                            <a data-params="/gp/cerberus/log/click/mid/ATVPDKIKX0DER/asin/B01LTIORC8/sub/B00M4L4MFC/pos/1/dev/WEB" class="a-link-normal cerberus-asin" href="/dp/B00M4L4MFC/ref=dp_cerb_1">

我在网上找到了以下代码:

Sub GetAboutUsLinks() 
Dim internet As Object
Dim html As Object
Dim myLinks As Object
Dim myLink As Object
Dim result As String
Dim myURL As String
Dim LastRow As Integer

Set internet = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
LastRow = Cells(Rows.Count, 1).End(xlUp).Row
'Loop through all the web links on the worksheet one by one and then do some things
For i = 2 To LastRow
'Get the link from the worksheet and assign it to the variable
myURL = Sheet1.Cells(i, 1).Value
'Now go to the website
internet.navigate myURL
'Keep the internet explorer visible
 internet.Visible = True
'Ensure that the web page has downloaded completely
While internet.ReadyState <> 4
DoEvents
Wend
'Get the data from the web page that is in the links and assign it to the 
 variable
 result = internet.document.body.innerHTML
'create a new html file
Set html = internet.document
MsgBox html.DocumentElement.innerHTML
'CreateObject (“htmlfile”)
'now place all the data extracted from the web page into the new html document
 html.body.innerHTML = result
 Set myLinks = html.getElementsByTagName(“a”)
'loop through the collected links and get a specific link defined by the conditions
For Each myLink In myLinks
If Right$(myLink, 9) = "ref=dp_cerb_1" Then
Sheet1.Cells(i, 2).Value = myLink
End If
'go to the next link
Next myLink
'once the last web link on the sheet has been visited close the internet explorer
If i = LastRow Then
internet.Quit
End If
'go to the next web link on the worksheet
Next i

End Sub

2 个答案:

答案 0 :(得分:0)

如果你已经下载了数据(并且你已将它放在工作表的单元格中),那么你不需要启动浏览器(顺便提一下那里的语法很好,GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}"),我会博客!)

请参阅此blog post以打开已下载的HTML并对其进行解析。

答案 1 :(得分:0)

你可以尝试一下吗? 。

{{1}}

End Sub