我已经在vba中编写了一些脚本来解析通往torrent网站下一页的链接。我的脚本能够刮掉它们。但是,我面临的问题是结果中出现了几个重复的链接。我的问题是,是否有任何技术可以解析唯一的链接?
Sub TorrentData()
Dim http As New XMLHTTP60, html As New HTMLDocument, post As Object
With http
.Open "GET", "https://yts.ag/browse-movies", False
.send
html.body.innerHTML = .responseText
End With
For Each post In html.getElementsByClassName("tsc_pagination")(0).getElementsByTagName("a")
If InStr(post, "page") > 0 Then
x = x + 1: Cells(x, 1) = post.href
End If
Next post
End Sub
抓取链接的部分图片:
在继续操作之前,请务必检查链接: " https://www.dropbox.com/s/647x3m65u90a1bu/Description1.txt?dl=0"
答案 0 :(得分:1)
我无法使网站正常运行。无论如何,使用字典来消除重复并写入同一循环内的单元格的正确方法应该如下所示:
For Each Post In html.getElementsByClassName("tsc_pagination")(0).getElementsByTagName("a")
If InStr(Post.href, "page") > 0 Then
If Not dict.Exists(Post.href) Then
dict.Add Post.href, "whatever information you would like to store"
x = x + 1
Cells(x, 1) = Post.href
End If
End If
Next Post