Question

我正在尝试获取数据（由脚本生成）并且我正在使用webbrowser控件应用以下内容的引言：C# webbrowser Ajax call

我的第一个主要代码是：

sample3d = function(n)
{
  df = data.frame() 

  while(n>0)
  {
    X = runif(1,-1,1) 
    Y = runif(1,-1,1)
    Z = runif(1,-1,1)
    a = X^2 + Y^2 + Z^2 

    if( a < 1 ) 

    {
      b = (X^2+Y^2+Z^2)^(0.5) 

      vector = data.frame(X = X/b, Y = Y/b, Z = Z/b) 
      df = rbind(vector,df)
      n = n- 1
    }
  }
  df
}
sample3d(n)

我得到的页面源不是浏览器显示的内容。当我修改如下代码时：

webBrowser1.Navigate("https://mobile.bet365.com/#type=Coupon;key=1-1-13-33977144-2-8-0-0-1-0-0-4100-0-0-1-0-0-0-0-0-0-0-0;ip=0;lng=1;anim=1");
while (webBrowser1.ReadyState != WebBrowserReadyState.Complete)
{
    System.Threading.Thread.Sleep(10);
    Application.DoEvents();
}
File.WriteAllText(@"C:\pagesource.txt", webBrowser1.DocumentText);

当然，我必须在显示对话框时按OK。页面源现在正确。

我不明白它是怎么回事。我只是想自动获取页面源（没有任何点击或用户操作）。

Answer 1

因此不需要webbrowser我会尝试切换到获取页面源的不同方法（也避免了webbrowser控件的开销）。

请注意，阅读HTML源代码非常困难 - 一旦页面布局发生变化或其他javascript脚本启动，您就会遇到问题。要从网页检索数据，您应该搜索rss feed，例如。你可以解析比html页面源更好。

但由于您提到的网址目前正在进行维护，因此我无法测试以下代码。我再次测试了我自己的页面，它在那里工作。当然，在我自己的页面上，你的网址上没有那么多的javascript。

下面我展示了3种不同的获取页面源的方法：

        string pageSource1 = null, pageSource2 = null, pageSource3 = null;
        try
        {
            using (System.Net.WebClient webClient = new System.Net.WebClient())
            {
                // perhaps fake user agent?
                webClient.Headers.Add("USER_AGENT", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36 OPR/51.0.2830.55");

                //
                // option 1: using webclient download string (simple call)
                pageSource1 = webClient.DownloadString(url);

                //
                // option 2: getting a stream... (if you prefer using a stream, eg. not reading the whole page until the end)
                var webClientStream = webClient.OpenRead(url);
                if (webClientStream != null)
                {
                    using (System.IO.StreamReader streamReader = new System.IO.StreamReader(webClientStream))
                    {
                        pageSource2 = streamReader.ReadToEnd();
                    }
                }
            }

            //
            // option3: using webrequest (with webrequest/webresponse you can rebuild the browser behavior eg. walking pages)
            System.Net.WebRequest webRequest = System.Net.WebRequest.Create(url);
            webRequest.Method = "GET";

            var webResponse = webRequest.GetResponse();
            var webResponseStream = webResponse.GetResponseStream();
            if (webResponseStream != null)
            {
                using (System.IO.StreamReader streamReader = new System.IO.StreamReader(webResponseStream))
                {
                    pageSource3 = streamReader.ReadToEnd();
                }
            }
        }
        catch (System.Net.WebException exc)// for web
        {
            Console.WriteLine($"Unable to download page source: {exc.Message}");
            // todo - safely handle...
        }
        catch (System.IO.IOException exc)//for stream
        {
            Console.WriteLine($"Unable to download page source: {exc.Message}");
            // todo - safely handle...
        }

希望它对您有所帮助！

c＃Webbrowser控件无法生成HTML源代码

1 个答案: