如何在VB.NET中为http://nntime.com/页面制作代理抓取器

时间:2015-04-26 12:42:32

标签: vb.net

我想在http://nntime.com/页面的VB.NET中创建代理抓取程序 任何人都可以帮忙吗?

Imports System.Text.RegularExpressions
Public Class Form1
Private Sub Button4_Click(sender As Object, e As EventArgs) Handles Button4.Click
    Me.Close()
End Sub

Private Sub Button3_Click(sender As Object, e As EventArgs) Handles Button3.Click
    ListBox1.Items.Clear()
End Sub

Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
    Dim sw As IO.StreamWriter
    Dim itms() As String = {ListBox1.Items.ToString}
    Dim save As New SaveFileDialog
    Dim it As Integer
    save.FileName = "Grabbed Proxies"
    save.Filter = "Grabbed Proxies (*.txt)|*.txt|ALL Files (*.*)|*.*"
    save.CheckPathExists = True
    save.ShowDialog(Me)
    sw = New IO.StreamWriter(save.FileName)
    For it = 0 To ListBox1.Items.Count - 1
        sw.WriteLine(ListBox1.Items.Item(it))
    Next
    sw.Close()
End Sub

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim the_request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://proxy-ip-list.com")
    'creating the httpwebresponce
    Dim the_response As System.Net.HttpWebResponse = the_request.GetResponse
    'defining the stream reader to read the data from the httpwebresponse
    Dim stream_reader As System.IO.StreamReader = New System.IO.StreamReader(the_response.GetResponseStream())
    'defining a string to stream reader fisnished streaming
    Dim code As String = stream_reader.ReadToEnd
    'haha here we use the regex
    Dim expression As New System.Text.RegularExpressions.Regex("[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:[0-9]{1,4}")
    'adding the proxies to the listbox
    Dim mtac As MatchCollection = expression.Matches(code)
    For Each itemcode As Match In mtac
        ListBox1.Items.Add(itemcode)
    Next
End Sub

但是没有在http://nntime.com/页面上工作

提前致谢:)

1 个答案:

答案 0 :(得分:0)

这是hidemyass的一个例子:

For Each s As String() In Regex.Matches(New WebClient().DownloadString("http://proxylist.hidemyass.com/"), "(?:<td class=""leftborder timestamp""(?s).+?<style>)((?s).+?)\s*<td>\s+(\d{2,5})</td>").Cast(Of Match)().[Select](Function(m) New String() {m.Groups(1).Value, m.Groups(2).Value})
        Regex.Matches(s(0), "\.([^\{]+)\{([^\}]+)\}").Cast(Of Match)().ToList().ForEach(Function(m) InlineAssignHelper(s(0), s(0).Replace(String.Format("class=""{0}""", m.Groups(1).Value), String.Format("style=""{0}""", m.Groups(2).Value))))
        ListBox1.Items.Add(String.Concat(Regex.Matches(Regex.Replace(Regex.Replace(s(0), "<(span|div) style=""display:none"">[\d\.]+</\1>", String.Empty).Remove(0, s(0).IndexOf("/style>")), "class=""\d+""", String.Empty), "[\d\.]+").Cast(Of Match)().[Select](Function(m) m.Value)) & ":" & s(1))
    Next