Question

所以我试图从网站上抓取用户名并遵循此教程

https://www.youtube.com/watch?v=FpAvBOhDrYk第一部分

https://www.youtube.com/watch?src_vid=FpAvBOhDrYk第二部分

并关注所有内容，但无法使其正常运行，但这是我使用的vb.net代码

导入System.Text.RegularExpressions

Public Class Form1

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim Request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://statigr.am/tag/anime")
    Dim response As System.Net.HttpWebResponse = Request.GetResponse

    Dim rs As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())

    Dim rssourcecode As String = rs.ReadToEnd

    '<a href="/hannahotaku">hannahotaku</a>

    Dim r As New System.Text.RegularExpressions.Regex("<a href=""/.*"">hannahotaku</a>")
    Dim matches As MatchCollection = r.Matches(rssourcecode)


    For Each itemcode As Match In matches
        ListBox1.Items.Add(itemcode.Value.Split("""").GetValue(1))

    Next


End Sub End Class

你可以看到我正在使用网站的statigram 我试图刮掉的来源是

<a href="/hannahotaku">hannahotaku</a>

请让我知道我做错了什么，因为我想刮掉部分在

(<a href="/**whatever username here**"></a>)

Answer 1

如果您想捕获整个链接：

(<a href="\/.+?">hannahotaku<\/a>)

如果您想捕获用户名：

<a href="\/(.+?)">hannahotaku<\/a>

从我所看到的，它的VB.net可能是：

<a href=""/(.+?)"">hannahotaku</a>

使用延迟匹配（+?）确保它只匹配所需的内容，没有额外的内容，以及加号以确保其中至少有一个单字母用户名，并且＆＃ 39;不完全是空的。

P.S。我对vb.net不是很熟悉，所以如果有一些改编要做，请告诉我。

<强> DEMO

Answer 2

请改用此正则表达式：

"<div><div>([^<]+)</div>"

在for循环中，使用itemcode.Groups(1).Value代替itemcode.Value.Split("""").GetValue(1)。这将为您提供div标签之间的部分。

要检索匹配项，请尝试将它们放入文件中：

Imports System.Text.RegularExpressions

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim Request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://statigr.am/tag/anime")
    Dim response As System.Net.HttpWebResponse = Request.GetResponse

    Dim rs As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())

    Dim rssourcecode As String = rs.ReadToEnd

    Dim r As New System.Text.RegularExpressions.Regex("<div><div>([^<]+)</div>")
    Dim matches As MatchCollection = r.Matches(rssourcecode)

    Using Dim addInfo = File.CreateText("c:\Textfile.txt")
        For Each itemcode As Match In matches
            addInfo.WriteLine(itemcode.Groups(1).Value)
        Next
    End Using


End Sub End Class

vb.net从网站上刮痧

2 个答案: