C#PhantomJS向下滚动页面,直到类元素消失

时间:2020-04-19 21:47:52

标签: c# phantomjs html-parsing

我有一个页面,可以在向下滚动时加载图片。虽然其加载的html代码具有类“ ajax-loading vk-like b-appear_marker active”,但在完全加载时,此元素就消失了。如何加载页面直到该元素消失?第一个示例运行良好,但是直到该元素消失后才完全加载页面。第二种选择原则上不起作用,我不能确切地说出问题所在,但我认为其他方法永远都行不通。

这是我尝试过的:

private static void GetRenderedWebPage(string url, TimeSpan waitAfterPageLoad, string height, Action<string> callBack)
{
    const string cEndLine = "All output received";

    var sb = new StringBuilder();
    var p = new PhantomJS();
    p.OutputReceived += (sender, e) =>
    {
        if (e.Data == cEndLine)
        {
            callBack(sb.ToString());
        }
        else
        {
            sb.AppendLine(e.Data);
        }
    };

    // works perfectly, but not loading page fully
    p.RunScript(@"
        var page = require('webpage').create();
        page.viewportSize = { width: 1920, height: 1080 };
        page.onLoadFinished = function(status) {
            if (status=='success') {
                setTimeout(function() {
                    console.log(page.content);
                    console.log('" + cEndLine + @"');
                    phantom.exit();
            }," + waitAfterPageLoad.TotalMilliseconds + @");
        }
    };
    var url = '" + url + @"';
    page.open(url);", new string[0]);

    /* doesnt work at all
    p.RunScript(@"
        var page = require('webpage').create();
        page.viewportSize = { width: 1920, height: 1080 };
        var url = '" + url + @"';
        page.open(url, function () {
            window.setInterval(function() {
                var count = page.content.match(/class='.ajax-loading vk-like b-appear_marker active'/g);
                if (count === null)
                {
                    page.evaluate(function() {
                        window.document.body.scrollTop = document.body.scrollHeight;
                    });
                }
                else
                {
                    console.log(page.content);
                    console.log('" + cEndLine + @"');
                    phantom.exit();
                }
            }, 500);
        });
    ", new string[0]);
    */
}

使用它像接收.txt文件中的输出并输入htmlDocument来解析我接下来要的内容。

try
{
    GetRenderedWebPage(link, TimeSpan.FromSeconds(Convert.ToDouble(time)), height, output =>
    {
        File.WriteAllText("output.txt", output);
        htmlDocument.LoadHtml(output);
    });
}
catch
{
     Console.WriteLine("an error has occured.");
}

0 个答案:

没有答案