从网站获取价值的麻烦 - vb.net

时间:2017-05-02 07:54:49

标签: vb.net

我不是想做一些复杂的事情,我只想从网站上检索某些标题,第一个按钮只是为了测试...事情是,即使是“lala”文字也没有显示出来这意味着它不会首先进入循环......

Public Class Form1
    Function ElementsByClass(document As HtmlDocument, classname As String)
        Dim coll As New Collection
        For Each elem As HtmlElement In document.All
            If elem.GetAttribute("appcenter").ToLower.Split(" ").Contains(classname.ToLower) Then
                coll.Add(elem)
            End If
        Next
        Return coll
    End Function
    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        Dim wb As New System.Net.WebClient
        wb.Headers.Add("user-agent", "Only a test!")
        Dim sourceString As String = wb.DownloadString("http://www.ign.com/games/upcoming")
        RichTextBox1.Text = sourceString
    End Sub

    Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
        Dim elementss As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("div")
        For Each pElem As HtmlElement In elementss
            If pElem.GetAttribute("class") = "item-title" Then
                RichTextBox1.Text = "lala"
                RichTextBox1.Text = pElem.InnerHtml
            End If
        Next
    End Sub
End Class

2 个答案:

答案 0 :(得分:0)

好的,我可以告诉你的是每个即将到来的新游戏的标题。
这样的东西应该为你做的伎俩。
我建议你使用更大的网页剪贴簿HTML Agility Pack因为你只想要几个字符串,这个解决方案对你来说应该没问题。

Imports System.Net
Imports System.Text.RegularExpressions

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim websiteURL As String = "http://www.ign.com/games/upcoming"
    getTiles(websiteURL) 'where you access this is up to you
End Sub

Private Sub getTiles(website As String)

    ListBox1.Items.Clear() 'Clear old results or any errors

    Dim tempTitles As New List(Of String)()
    Dim webClient As New WebClient()
    webClient.Headers.Add("user-agent", "null")

    Try 'If the website happens to go offline, at least your application wont crash.
        Dim content As String = webClient.DownloadString(website)
        Dim pattern As String = "alt=""(?<Data>[^>]*)""/>"
        For Each title As Match In (New Regex(pattern).Matches(content)) 'Since you are only pulling a few strings, I thought a regex would be better.
            tempTitles.Add(title.Groups("Data").Value)
        Next
        Dim titles = tempTitles.Distinct().ToArray() 'remove duplicate titles

        For Each title As String In titles
            ListBox1.Items.Add(title) 'what you do with the values from here is up to you.
        Next

        If titles.Count() = 0 Then
            ListBox1.Items.Add("Nothing Found")
        End If
    Catch ex As Exception
        ListBox1.Items.Add(ex.Message)
        Return
    End Try
End Sub

我已经写了一些评论,以便回答您可能遇到的任何问题的代码。
如果我遗漏了一些内容,请随时在下面发表评论,Happy Coding

答案 1 :(得分:0)

对不起,但我很难解释这些事情所以请分析我在之前评论中提到的网站上的正则表达式。

您指定的网站会列出以下游戏:

<a class="product_spot " href="/browse?nav=16k-3-rime,28zu0" data-date="05/06/2017"><img src="/gs/pages/landing/upcoming-video-games/images/223x120_rime.jpg"><p>RiME<br><br><span>05/06/2017</span></p></a>
<a class="product_spot " href="/browse/games?nav=16k-3-the+surge,28zu0,13ffff2418" data-date="05/16/2017"><img src="/gs/pages/landing/upcoming-video-games/images/223x120_thesurge.jpg"><p>The Surge<br><br><span>05/16/2017</span></p></a>

所以这个正则表达式会匹配它。

<a class=.product_spot\s.\shref=.(?:.+?)\sdata-date=.(?:.+?)><img\ssrc=(?:.+?)><p>(.+?)<br><br><span>(?:.+?)<\/span><\/p><\/a>
Imports System.Net
Imports System.Text.RegularExpressions

Module Module1
    Sub Main()
        Dim wc As New WebClient
        Dim input As String = wc.DownloadString("http://www.gamestop.com/collection/upcoming-video-games")
        Dim games As New List(Of String)
        Dim matchCollection As MatchCollection = Regex.Matches(input, "<a class=.product_spot\s.\shref=.(?:.+?)\sdata-date=.(?:.+?)><img\ssrc=(?:.+?)><p>(.+?)<br><br><span>(.+?)<\/span><\/p><\/a>")

       For Each item As Match In matchCollection
            games.Add(item.Groups(1).Value.ToString)
        Next

        For Each item As String In games
            Console.WriteLine(item)
        Next

        Console.ReadLine()
    End Sub
End Module

输出:

Dead Island 2
Final Fantasy XV
De-Formers
Injustice 2
...
Killing Floor 2
Tales of Berseria
Nintendo Switch
Mass Effect Andromeda
MLB The Show 17
Has-Been Heroes
Ride 2
...
..
.