读取网页内容会在禁用状态下返回JS

时间:2019-04-20 12:18:22

标签: c# html http web-scraping

我编写了以下代码来读取网页内容:

string url = "https://hackerone.com/directory?asset_type=URL&order_direction=DESC&order_field=started_accepting_at";
HttpClient httpclient = new HttpClient();
var html = httpclient.GetStringAsync(url);
MessageBox.Show(html.Result);//returns JavaScript Is disabled In your browser as a part of the response body

问题是html.Result中出现了“ JavaScript被禁用”的问题,因此有人建议将URL更改为以下内容:

  

http://service.prerender.io/https://hackerone.com/directory?asset_type=URL&order_direction=DESC&order_field=started_accepting_at

但这没用,有什么主意吗?

编辑:使用此代码可以正常工作,但速度非常慢(如6秒)!

        string html = string.Empty;
        string url = "https://hackerone.com/directory?asset_type=URL&order_direction=DESC&order_field=started_accepting_at";
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
        using (Stream stream = response.GetResponseStream())
        using (StreamReader reader = new StreamReader(stream))
        {
            html = reader.ReadToEnd();
        }

1 个答案:

答案 0 :(得分:0)

您必须使用无头浏览器(例如硒,飞溅等),它可以运行网站的脚本并为您提供完整的网页。 您可以看到有关C#无头浏览器的问题和答案:

Headless browser for C# (.NET)?

此列表在GitHub中:

https://github.com/dhamaniasad/HeadlessBrowsers