VBA废料网站获取所有艺术家姓名和专辑?

时间:2017-07-06 19:57:40

标签: html excel vba web-scraping

好的,我完全不知道自己需要做些什么......

这是代码。我首先从网站上获取了一些链接,然后我应该按链接链接并从该网站获取一些数据...

Sheets("LINKS TEMP").Activate

Dim httpObject As Object
Set httpObject = CreateObject("MSXML2.XMLHTTP")
Dim doc As Object
Set doc = CreateObject("htmlfile")
Dim links As Variant
Dim l As Variant

With httpObject
    .Open "GET", "http://www.smoothjazz.com/charts/", False
    .Send
    Do Until httpObject.ReadyState = 4
    Loop
    doc.body.innerhtml = .responseText
    Set links = doc.getElementsByTagName("a")
    I = 1
    For Each l In links
        Sheets("LINKS TEMP").Cells(I, 2).Value = l.href
        I = I + 1
    Next l
End With

Sheets("LINKS TEMP").Activate
lrLINKS = Cells(Rows.Count, "B").End(xlUp).Row
copyLINKS = 1
For deleteLINKS = 1 To lrLINKS
If Cells(deleteLINKS, 2).Value Like "*votesbytrack*" Then GoTo copyLOOP
GoTo nextLOOP
copyLOOP:
Cells(deleteLINKS, 2).Copy
Cells(copyLINKS, 1).PasteSpecial xlPasteValues
copyLINKS = copyLINKS + 1
nextLOOP:
Next deleteLINKS
LRlinks2 = Cells(Rows.Count, "A").End(xlUp).Row

For formatLINKS = 1 To LRlinks2
Cells(formatLINKS, 1).Value = "http://smoothjazz.com/charts/" & 
Right(Cells(formatLINKS, 1), Len(Cells(formatLINKS, 1)) - 6)
nextLINK = Sheets("LINKS TEMP").Cells(formatLINKS, 1).Text
Range("B:B").ClearContents

With httpObject
    .Open "GET", Sheets("LINKS TEMP").Cells(formatLINKS, 1).Text, False
    .Send
    Do Until httpObject.ReadyState = 4
    Loop
    doc.body.innerhtml = .responseText
    Set elem = doc.getElementsByClassName("trackingalbumname")
    x = 1
    For Each l In elem
        Sheets("HEADER TEMP").Cells(x, 1).Value = l.span
        x = x + 1
    Next l
End With

Next formatLINKS

为什么它与GetElementsByClassName有关?

这是HTML:

<div class='content'>


	
    <p>&nbsp;</p>
    <table width="601" border="0" cellspacing="0" cellpadding="0">
      <tr>
        <td class="trackingtitle">
			
					<span class="trackingartistname">NATHAN EAST<br/></span>
					<span class="trackingalbumname">Reverence <br/></span>

这只是一个艺术家/专辑,但它需要通过整个HTML运行并找到它们全部!

我需要做什么......除了开始努力学习?!

1 个答案:

答案 0 :(得分:0)

我通常使用InternetExplorer.Application而不是MSXML2.XMLHTTP。我听说htmlfile已经从IE 11中弃用了,将不再有效。至于getelementsbyclassname。它应该是.document.getelementsbyclassname。此外,不确定你是否想要&#34; span&#34;或属性。我使用tagname作为演示。要获取艺术家的值,请为innertext或innerhtml交换标记名。您可以在本地窗口中观察对象属性并逐步执行代码。查找与您尝试废弃的数据相匹配的属性值。

startResolutionForResult()

enter image description here