POST一个http表单并使用web抓取获取目标html

时间:2014-12-31 12:49:27

标签: c# .net httpwebrequest httpwebresponse

我目前正致力于建立一个网站,以便发布到第三方网站,并使用以下方法从中提取详细信息

String htmlCode = "<html>" +
"<head>" +
"<title>Form</title>" +
"</head>" +
"<body onload=\"javascript:document.forms[0].submit()>" +
"<form method=\"post\" action=\"%verylongactionurl%\">" +
"<input type=\"hidden\" name=\"key\" value=\"value\">" +
"</form>" +
"</body>" +
"</html>";

我在我的c#代码中替换了上面html字符串中的所有必需值,然后我正在执行以下操作,将内容写入我的页面,这样做非常好

Response.Write(httpForm);

有没有办法可以捕获我在代码上面的步骤中获得的目标表单的html?

这是针对网页报废的新要求,并从目标网站提取所需的详细信息,并在我们的应用程序中显示所需的值。

我尝试了以下不起作用的代码。我在reponseURL中看到目标站点的错误页面,我回来了。

HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(sourceUrl);
request.AllowAutoRedirect = true;
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
string postData = HttpUtility.UrlEncode(String.Format("key={0}&", value));
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = postData.Length;

// This is sent to the Post
byte[] bytes = Encoding.UTF8.GetBytes(postData);


//request.ContentLength = bytes.Length;

using (Stream requestStream = request.GetRequestStream())
{
    requestStream.Write(bytes, 0, postData.Length);
    requestStream.Flush();
    requestStream.Close();

    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
}

1 个答案:

答案 0 :(得分:0)

尝试做这样的事情

//Createing instans of web client
WebClient wc = new WebClient();

//Getting the html content of the whattsap application page for android
string HtmlString = wc.DownloadString("http://www.whatsapp.com/android");

//Loading the html content into HtmlAgilityPack HTML Document
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(HtmlString);

//Extracting The latest version string from the HTML content by searching the "P" with class named version
//and retriving it's inner text.
_currentVersion = htmlDoc.DocumentNode.Descendants("p").Where(d => d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("version")).First().InnerText;

//removing the "Version" keyword from the version string so we can get only rhe version number
_currentVersion = _currentVersion.Replace("Version", "").Trim();

在这个例子中,我正在提取whatsapp应用程序的最新版本号 海峡从那里网站。

所以你要发布的唯一内容是你需要从

中提取数据的网址