HttpWebRequest OR DownloadString>>检索HTML的一部分

时间:2014-02-28 15:23:07

标签: c# html

我正在尝试返回网站的html scipt,输入url。问题是脚本返回部分html而不是整个部分。具体问题是网站http://www.4xinvestmentgroup.com。那么,您对可能出现的问题有任何想法吗?

首先我尝试了以下脚本:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string url = "http://www.4xinvestmentgroup.com";

            HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
            HttpWebResponse res = (HttpWebResponse)req.GetResponse();

            StreamReader sr = new StreamReader(res.GetResponseStream(), Encoding.GetEncoding(res.CharacterSet));
            Console.WriteLine(sr.ReadToEnd());
            sr.Close();

            Console.WriteLine("Press enter to close...");
            Console.ReadLine();
        }
    }
}

之后我尝试了以下脚本:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string url = "http://www.4xinvestmentgroup.com";

            WebClient client = new WebClient();
            string reply = client.DownloadString(url);

            Console.WriteLine(reply);

            Console.WriteLine("Press enter to close...");
            Console.ReadLine();
        }
    }
}

在两个解决方案中,返回的html是:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/x
html1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="pl" lang="pl">
<head>
      <base href="http://www.4xinvestmentgroup.com/" />
  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  <meta name="keywords" content="Forex, 4x investment group, 4xinvestmentgroup,
trading, forex analysis, forex market, forex signal provider, Economic Calendar,
trading profit, Exchange Market, Exchange Rates" />
  <meta name="description" content="4x Investment Group, Forex Signal Provider,
Trading services - The Foreign Exchange Market can ensure a Huge Trading Profit.
 " />
  <title>4x Investment Group</title>
  <link href="/index.php?format=feed&amp;type=rss" rel="alternate" type="applica
tion/rss+xml" title="RSS 2.0" />
  <link href="/index.php?format=feed&amp;type=atom" rel="alternate" type="applic
ation/atom+xml" title="Atom 1.0" />
  <link href="/favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon"
/>
  <link rel="stylesheet" href="/media/system/css/modal.css" type="text/css" />
  <link rel="stylesheet" href="/templates/gk_finance_business/css/k2.css" type="
text/css" />
  <link rel="stylesheet" href="http://www.4xinvestmentgroup.com/templates/gk_fin
ance_business/css/mobile/handheld.css" type="text/css" />
  <script src="/media/system/js/mootools-core.js" type="text/javascript"></scrip
t>
  <script src="/media/system/js/core.js" type="text/javascript"></script>
  <script src="/media/system/js/mootools-more.js" type="text/javascript"></scrip
t>
  <script src="/media/system/js/modal.js" type="text/javascript"></script>

  <script src="/components/com_k2/js/k2.js" type="text/javascript"></script>
  <script src="/media/system/js/caption.js" type="text/javascript"></script>
  <script src="http://www.4xinvestmentgroup.com/templates/gk_finance_business/js
/mobile/gk.handheld.js" type="text/javascript"></script>


    <meta name="viewport" content="width=device-width, minimum-scale=1.0, maximu
m-scale=1.0" />
    </head>
<body>
    <div id="gkWrap">
        <div id="gkTopWrap">
                                        <h1 id="gkHeader" class="cssLogo">
                   <a href="/./">4x Investment Group</a>
                          </h1>

                                <a href="#" id="gk-btn-switch" ><span>Switch to
desktop</span></a>


                                <a href="http://www.4xinvestmentgroup.com/index.
php?option=com_users&amp;view=login" id="gk-btn-login" ><span>Login</span></a>
                        </div>

        <div id="gkNav">
                <div id="gkNavContent">
                        <select id="gkMenu" onchange="window.location.href=this.
value;">
                        <option  value="/index.php/home-mobile">4x Investment Gr
oup</option><option  value="#">Explore Forex<option  value="/index.php/explore-f
orex-2/benefits-of-trading">&nbsp;&nbsp;&raquo;Benefits of Trading</option><opti
on  value="/index.php/explore-forex-2/risk-statement">&nbsp;&nbsp;&raquo;Risk St
atement</option></option><option  value="#">Forex Tools<option  value="/index.ph
p/forex-tools-2/currency-converter">&nbsp;&nbsp;&raquo;Currency Converter</optio
n></option><option  value="/index.php/4x-investment-group-provider-2">Case Study
 (2)</option><option  value="#">About us<option  value="/index.php/about-us-2/fe
w-words">&nbsp;&nbsp;&raquo;Few words</option><option  value="/index.php/about-u
s-2/contact-form">&nbsp;&nbsp;&raquo;Support</option></option>
        </select>
                </div>
        </div>

        <div id="gkContent">

                <div id="gkMain">

<div id="system-message-container">
</div>

<div class="blog-featured">





</div>


                </div>


                <div id="gkFooter">
                        <p id="gkCopyrights">4x Investment Group Ac 2012. All ri
ghts reserved.</p>

                        <p id="gkOptions">
                                <a href="#gkHeader">Top</a>
                                <a href="javascript:setCookie('gkGavernMobileFin
ance_Business', 'desktop', 365);window.location.reload();">Desktop version</a>
                        </p>
                </div>
        </div>
        </div>

        </body>
</html>

但是,如果您从浏览器中查看原始html,则详细信息会更加丰富。

2 个答案:

答案 0 :(得分:0)

似乎请求被重定向到网站的移动版本。尝试将用户代理字符串设置为桌面浏览器使用的字符串。例如:

<强> HttpWebRequest的

req.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)";

<强> Web客户端

client.Headers.Add("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");

答案 1 :(得分:0)

使用WebBrowser Control获取此站点的完整html的一种方法。

创建窗体表单应用程序。从工具箱添加webbrowser控件。内部表单加载事件使用以下代码。

    webBrowser1.Navigate("http://www.4xinvestmentgroup.com");

    while(webBrowser1.ReadyState != WebBrowserReadyState.Complete)
    { 
       // just to keep it busy until document is not loaded completely.
       Application.DoEvents();
    }

    string html = webBrowser1.DocumentText;