已加载浏览器文档

时间:2013-01-19 18:39:11

标签: c# .net

我正在尝试使用从linkedin中提取的一些数据来填充网格,我只是试图让它适用于我自己的学习曲线,但是如果我删除该行

MessageBox.Show("asdfasdfasdf")

列表“消息”只有1个项目,如果我在上面包含该行,那么预期会发生什么,我会收到15条消息

有人可以解释一下吗?

public void extract_messages_received(object sender, RoutedEventArgs e)
{
    triggered = false;
    System.Windows.Forms.WebBrowser browser = new System.Windows.Forms.WebBrowser();
    browser.Navigate(new Uri(@"http://www.linkedin.com/inbox/messages/received"));
    browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
}

private void LoadMessages(string url)
{
    txtOutput.Text = @"http://www.linkedin.com" + url.Substring(6, url.Length - 6);
    if (!urls.Contains(url))
    {
        urls.Add(url);
        WebBrowser browser = new WebBrowser();
        browser.Navigate(new Uri(txtOutput.Text);

        loaded_message = false;
        browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(ReadMessages);
    }
}

private void ReadMessages(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    if (loaded_message == false)
    {        
        string url = ((WebBrowser)sender).Url.ToString();
        int loc1 = url.IndexOf("itemID") + 7;
        int loc2 = url.IndexOf("&", loc1);
        IEnumerable<string> name = null;
        IEnumerable<string> odate = null;
        IEnumerable<string> photo = null;
        IEnumerable<string> subject = null;
        IEnumerable<string> headline = null;
        string body = "";
        string id = url.Substring(loc1, loc2 - loc1);
        //System.Windows.MessageBox.Show("READ");
        foreach (HtmlElement element in ((WebBrowser)sender).Document.GetElementsByTagName("div"))
        {
            if (element.GetAttribute("classname").Equals("inbox-item-body"))
            {
                body = element.InnerText;
            }
            if (element.GetAttribute("classname").Equals("inbox-item-header"))
            {
                var doc = new HtmlAgilityPack.HtmlDocument();
                doc.LoadHtml(element.InnerHtml);
                name = from foo in doc.DocumentNode.SelectNodes("//a[@class='fn']") select foo.InnerText;
                odate = from foo in doc.DocumentNode.SelectNodes("//p[@class='date']") select foo.InnerText;
                photo = from foo in doc.DocumentNode.SelectNodes("//img[@class='photo']") select foo.Attributes["src"].Value;
                subject = from foo in doc.DocumentNode.SelectNodes("//h3") select foo.InnerText;
                headline = from foo in doc.DocumentNode.SelectNodes("//span[@class='headline']") select foo.InnerText;
            }
        }

        // ****
        MessageBox.Show("asdfasdfasdf");
        // ****

        messages.Add(new Messages()
        {
            ID = id,
            Subject = subject.First().ToString(),
            Headline = headline.First().ToString(),
            Sender = name.First().ToString(),
            Photo = photo.First().ToString(),
            SendDate = odate.First().ToString(),
            Body = body
        });

           // dataMessages.ItemsSource = messages;
    }
    loaded_message = true;
}

void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    if (!triggered)
    {
        triggered = true;
        System.Windows.Forms.WebBrowser web = sender as System.Windows.Forms.WebBrowser;
        foreach (HtmlElement element in web.Document.GetElementsByTagName("ol"))
        {
            if (element.GetAttribute("classname").Contains("inbox-list "))
            {
                WebBrowser browser = new WebBrowser();
                browser.Navigate("about:blank");
                browser.Document.Write(element.InnerHtml);
                HtmlElementCollection hrefTags = null;
                hrefTags = browser.Document.GetElementsByTagName("a");
                foreach (HtmlElement a in hrefTags)
                {
                    if (a.OuterHtml.Contains("displayMBox"))
                    {
                        LoadMessages(a.GetAttribute("href"));
                    }
                }
            }
        }
    }       
}

1 个答案:

答案 0 :(得分:0)

这是一个时间问题。

当您在其中显示消息框时,loaded_message在您关闭消息框之前不会设置为true,因此其他事件也会处理,直到消息框为止,在关闭第一个消息框之前,其中没有一个设置为loaded_message

如果您足够快地关闭消息框,您可能会在1到15之间看到一些数字。

让我们举一个更简单的例子:

    private void Form1_Load(object sender, EventArgs e)
    {

        for (int i = 0; i < 5; i++)
        {
            WebBrowser wb = new WebBrowser();
            wb.DocumentCompleted += wb_DocumentCompleted;
            wb.Navigate("http://www.stackoverflow.com");
        }
    }

    bool shown = false;
    void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        if (!shown)
        {
            Console.WriteLine(shown);
            MessageBox.Show(shown.ToString());
            shown = true;
        }
    }

现在,如果您观看控制台,您会在显示第一个消息框之前看到一些false。当我关闭消息框时,我会看到另外4个消息框,因为这些消息已经排队等待在shown设置为true之前显示。如果我注释掉了消息框,那么我只会在控制台中显示一个消息框和一个false

现在,问题变成了,为什么要添加并需要检查loaded_message布尔变量。

我的猜测是你只想加载每条消息一次。如果是这种情况,请跟踪字典中的每个URL并为每个URL维护一个bool:

    Dictionary<string, bool> loadedUrls = new Dictionary<string, bool>();
    private void Form1_Load(object sender, EventArgs e)
    {

        for (int i = 0; i < 5; i++)
        {
            WebBrowser wb = new WebBrowser();
            wb.DocumentCompleted += wb_DocumentCompleted;
            string url = "http://stackoverflow.com/" + i;

            loadedUrls.Add(url, false);
            wb.Navigate(url);
        }
    }

    bool shown = false;
    void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {

        if (loadedUrls.ContainsKey(e.Url.OriginalString) && loadedUrls[e.Url.OriginalString] == false)
        {
            loadedUrls[e.Url.OriginalString] = true;
            Console.WriteLine(shown);
            shown = true;
        }
    }

我将shown留在那里以证明这种新方法现在适用于文档已完成事件中的每个传递。您的输出窗口应为false,后跟4 true