vb.net httpwebrequest获取带谷歌链接的HTML

时间:2012-07-02 07:42:48

标签: vb.net httpwebrequest

Imports System.Net
Imports System.IO

Public Class Form1
    Public Function GetHTML(ByVal url As Uri) As String
        Dim HTML As String

        Dim Request As HttpWebRequest
        Dim Response As HttpWebResponse
        Dim Reader As StreamReader

        Try
            Request = HttpWebRequest.Create(url)
            Response = Request.GetResponse
            Reader = New StreamReader(Response.GetResponseStream())

            HTML = Reader.ReadToEnd
        Catch ex As Exception
            HTML = Nothing
        End Try

        Return HTML
    End Function

    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
        Dim url As Uri = New Uri(TextBox1.Text)

        TextBox2.Text = GetHTML(url)
    End Sub
End Class

以上是我从网页上获取HTML的代码。如果我输入类似http://www.google.com.sg/url?sa=t&rct=j&q=vb.net%20convert%20string%20to%20uri&source=web&cd=1&ved=0CFcQFjAA&url=http%3A%2F%2Fwww.vbforums.com%2Fshowthread.php%3Fp%3D3434187&ei=R0fxT872Cs2HrAesq4m-DQ&usg=AFQjCNGGedjegaM8osT689qWhbqpf6NI7Q

的内容,我遇到了问题

它给了我

   <script>window.googleJavaScriptRedirect=1</script>
    <script>
    var f={};
    f.navigateTo=function(b,a,g){
      if(b!=a&&b.google)
      {
        if(b.google.r)
         {
           b.google.r=0;
           b.location.href=g;
           a.location.replace("about:blank");
         }
      }
      else
      {
        a.location.replace(g);
      }
    };

    f.navigateTo(window.parent,window,"http://www.vbforums.com/showthread.php?p\x3d3434187");

    </script>
    <noscript>
    <META http-equiv="refresh" content="0;URL='http://www.vbforums.com/showthread.php?p=3434187'">
    </noscript>

而不是http://www.vbforums.com/showthread.php?p=3434187

的html

如何让我的代码进行重定向并获取html?

1 个答案:

答案 0 :(得分:1)

从元标记中删除网址,然后发出新请求。对于抓取,我建议使用HtmlAgilityPack,您可以在http://html-agility-pack.net/下载它或者用NuGet安装它。