我不是想做一些复杂的事情,我只想从网站上检索某些标题,第一个按钮只是为了测试...事情是,即使是“lala”文字也没有显示出来这意味着它不会首先进入循环......
Public Class Form1
Function ElementsByClass(document As HtmlDocument, classname As String)
Dim coll As New Collection
For Each elem As HtmlElement In document.All
If elem.GetAttribute("appcenter").ToLower.Split(" ").Contains(classname.ToLower) Then
coll.Add(elem)
End If
Next
Return coll
End Function
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim wb As New System.Net.WebClient
wb.Headers.Add("user-agent", "Only a test!")
Dim sourceString As String = wb.DownloadString("http://www.ign.com/games/upcoming")
RichTextBox1.Text = sourceString
End Sub
Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
Dim elementss As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("div")
For Each pElem As HtmlElement In elementss
If pElem.GetAttribute("class") = "item-title" Then
RichTextBox1.Text = "lala"
RichTextBox1.Text = pElem.InnerHtml
End If
Next
End Sub
End Class
答案 0 :(得分:0)
好的,我可以告诉你的是每个即将到来的新游戏的标题。
这样的东西应该为你做的伎俩。
我建议你使用更大的网页剪贴簿HTML Agility Pack因为你只想要几个字符串,这个解决方案对你来说应该没问题。
Imports System.Net
Imports System.Text.RegularExpressions
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim websiteURL As String = "http://www.ign.com/games/upcoming"
getTiles(websiteURL) 'where you access this is up to you
End Sub
Private Sub getTiles(website As String)
ListBox1.Items.Clear() 'Clear old results or any errors
Dim tempTitles As New List(Of String)()
Dim webClient As New WebClient()
webClient.Headers.Add("user-agent", "null")
Try 'If the website happens to go offline, at least your application wont crash.
Dim content As String = webClient.DownloadString(website)
Dim pattern As String = "alt=""(?<Data>[^>]*)""/>"
For Each title As Match In (New Regex(pattern).Matches(content)) 'Since you are only pulling a few strings, I thought a regex would be better.
tempTitles.Add(title.Groups("Data").Value)
Next
Dim titles = tempTitles.Distinct().ToArray() 'remove duplicate titles
For Each title As String In titles
ListBox1.Items.Add(title) 'what you do with the values from here is up to you.
Next
If titles.Count() = 0 Then
ListBox1.Items.Add("Nothing Found")
End If
Catch ex As Exception
ListBox1.Items.Add(ex.Message)
Return
End Try
End Sub
我已经写了一些评论,以便回答您可能遇到的任何问题的代码。
如果我遗漏了一些内容,请随时在下面发表评论,Happy Coding
答案 1 :(得分:0)
对不起,但我很难解释这些事情所以请分析我在之前评论中提到的网站上的正则表达式。
您指定的网站会列出以下游戏:
<a class="product_spot " href="/browse?nav=16k-3-rime,28zu0" data-date="05/06/2017"><img src="/gs/pages/landing/upcoming-video-games/images/223x120_rime.jpg"><p>RiME<br><br><span>05/06/2017</span></p></a>
<a class="product_spot " href="/browse/games?nav=16k-3-the+surge,28zu0,13ffff2418" data-date="05/16/2017"><img src="/gs/pages/landing/upcoming-video-games/images/223x120_thesurge.jpg"><p>The Surge<br><br><span>05/16/2017</span></p></a>
所以这个正则表达式会匹配它。
<a class=.product_spot\s.\shref=.(?:.+?)\sdata-date=.(?:.+?)><img\ssrc=(?:.+?)><p>(.+?)<br><br><span>(?:.+?)<\/span><\/p><\/a>
Imports System.Net
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim wc As New WebClient
Dim input As String = wc.DownloadString("http://www.gamestop.com/collection/upcoming-video-games")
Dim games As New List(Of String)
Dim matchCollection As MatchCollection = Regex.Matches(input, "<a class=.product_spot\s.\shref=.(?:.+?)\sdata-date=.(?:.+?)><img\ssrc=(?:.+?)><p>(.+?)<br><br><span>(.+?)<\/span><\/p><\/a>")
For Each item As Match In matchCollection
games.Add(item.Groups(1).Value.ToString)
Next
For Each item As String In games
Console.WriteLine(item)
Next
Console.ReadLine()
End Sub
End Module
输出:
Dead Island 2
Final Fantasy XV
De-Formers
Injustice 2
...
Killing Floor 2
Tales of Berseria
Nintendo Switch
Mass Effect Andromeda
MLB The Show 17
Has-Been Heroes
Ride 2
...
..
.