Question

我想为VK.com编写一个页面解析器。我的问题是，页面源只包含50个结果，其他的在到达页面末尾后重新加载。

我的代码直到现在：

    private void syncToolStripMenuItem_Click(object sender, EventArgs e)
    {
        string[] information, title, artist;
        int i = 0;
        List<string> joint = new List<string>();
        information = info_basic(webBrowser1.DocumentText);
        title = info_title(information);
        artist = info_artist(information);
        foreach (string str in title)
        {
            joint.Add(artist[i] + " - " + title[i]);
            i++;
        }
        listBox1.Items.Clear();
        listBox1.Items.AddRange(joint.ToArray());
    }

    private string[] info_basic(string source)
    {
        string[] temps;
        List<string> sub = new List<string>();
        temps = Regex.Split(source, "<div class=\"play_btn fl_l\">");
        foreach (string str in temps)
        {
            sub.Add(str);
        }
        sub.RemoveRange(0, 1);
        return sub.ToArray();
    }

重要的网页代码：

http://csharp.bplaced.net/files/vk%20source.txt

Answer 1

我建议您在滚动到底部时监控从页面到vk.com的流量（例如，使用fiddler http代理），并找出页面是如何动态加载的。最有可能这是通过javascript的ajax异步调用完成的。然后，在代码中模拟相同的行为以加载整个页面。 HttpWebRequest类最适合此任务。

但是由于您正在使用webBrowser控件，并且它可能完成了加载内容的所有工作 - 您可以尝试以编程方式滚动Web浏览器控件视图，以便本机js将触发并加载内容，当您到达时停止底部，然后解析整个加载的页面。

读取到达页面末尾后自动完成的页面源

1 个答案: