Web浏览器行为问题

时间:2013-09-02 11:52:38

标签: c# browser web-scraping webbrowser-control screen-scraping

我正在尝试使用.NET C#自动化Webbrowser。问题是控件或者我应该说IE浏览器在不同的计算机上表现得很奇怪。例如,我点击链接并在第一台计算机上填写一个Ajax弹出窗体,没有任何错误:

private void btn_Start_Click(object sender, RoutedEventArgs e)
{
    webbrowserIE.Navigate("http://www.test.com/");
    webbrowserIE.DocumentCompleted += fillup_LoadCompleted; 
}

void fillup_LoadCompleted(object sender, System.Windows.Forms.WebBrowserDocumentCompletedEventArgs e)
{
    System.Windows.Forms.HtmlElement ele = web_BrowserIE.Document.GetElementById("login");
    if (ele != null)
        ele.InvokeMember("Click");

    if (this.web_BrowserIE.ReadyState == System.Windows.Forms.WebBrowserReadyState.Complete)
    {
        web_BrowserIE.Document.GetElementById("login").SetAttribute("value", myUserName);
        web_BrowserIE.Document.GetElementById("password").SetAttribute("value", myPassword);

        foreach (System.Windows.Forms.HtmlElement el in web_BrowserIE.Document.GetElementsByTagName("button"))
        {
            if (el.InnerText == "Login")
            {
                el.InvokeMember("click");
            }
        }

        web_BrowserIE.DocumentCompleted -= fillup_LoadCompleted;        
    }
}

但是,上面的代码不能在第二台PC上运行,唯一的方法就是这样:

private void btn_Start_Click(object sender, RoutedEventArgs e)
{
    webbrowserIE.DocumentCompleted += click_LoadCompleted;
    webbrowserIE.Navigate("http://www.test.com/"); 
}

void click_LoadCompleted(object sender, System.Windows.Forms.WebBrowserDocumentCompletedEventArgs e)
{
    if (this.webbrowserIE.ReadyState == System.Windows.Forms.WebBrowserReadyState.Complete)
    {
        System.Windows.Forms.HtmlElement ele = webbrowserIE.Document.GetElementById("login");
        if (ele != null)
            ele.InvokeMember("Click");

        webbrowserIE.DocumentCompleted -= click_LoadCompleted;
        webbrowserIE.DocumentCompleted += fillup_LoadCompleted;
    }
}

void click_LoadCompleted(object sender, System.Windows.Forms.WebBrowserDocumentCompletedEventArgs e)
{

        webbrowserIE.Document.GetElementById("login_login").SetAttribute("value", myUserName);
        webbrowserIE.Document.GetElementById("login_password").SetAttribute("value", myPassword);

        //If you know the ID of the form you would like to submit:
        foreach (System.Windows.Forms.HtmlElement el in webbrowserIE.Document.GetElementsByTagName("button"))
        {
            if (el.InnerText == "Login")
            {
                el.InvokeMember("click");
            }
        }

        webbrowserIE.DocumentCompleted -= click_LoadCompleted;      
}

所以,在第二个解决方案中,我必须调用两个Load Completed Chains。有人可以建议我该如何处理这个问题?此外,提出更强大的方法将非常有帮助。提前谢谢

1 个答案:

答案 0 :(得分:3)

我可以推荐两件事:

  • 请勿在{{1​​}}事件后执行您的代码,而应在DOM window.onload事件上执行。
  • 要确保您的网页在DocumentComplete控件中的行为与在完整Internet Explorer浏览器中的行为相同,请考虑实施Feature Control

[已编辑] 根据您的代码结构,还有一个建议。显然,您执行一系列导航/处理WebBrowser操作。为此,使用DocumentComplete可能更自然,更容易。以下是使用或不使用async/await执行此操作的示例。它还说明了如何处理async/await

onload

以下是没有async Task DoNavigationAsync() { bool documentComplete = false; TaskCompletionSource<bool> onloadTcs = null; WebBrowserDocumentCompletedEventHandler handler = delegate { if (documentComplete) return; // attach to onload only once per each Document documentComplete = true; // now subscribe to DOM onload event this.wb.Document.Window.AttachEventHandler("onload", delegate { // each navigation has its own TaskCompletionSource if (onloadTcs.Task.IsCompleted) return; // this should not be happening // signal the completion of the page loading onloadTcs.SetResult(true); }); }; // register DocumentCompleted handler this.wb.DocumentCompleted += handler; // Navigate to http://www.example.com?i=1 documentComplete = false; onloadTcs = new TaskCompletionSource<bool>(); this.wb.Navigate("http://www.example.com?i=1"); await onloadTcs.Task; // the document has been fully loaded, you can access DOM here MessageBox.Show(this.wb.Document.Url.ToString()); // Navigate to http://example.com?i=2 // could do the click() simulation instead documentComplete = false; onloadTcs = new TaskCompletionSource<bool>(); // new task for new navigation this.wb.Navigate("http://example.com?i=2"); await onloadTcs.Task; // the document has been fully loaded, you can access DOM here MessageBox.Show(this.wb.Document.Url.ToString()); // no more navigation, de-register DocumentCompleted handler this.wb.DocumentCompleted -= handler; } 模式的相同代码(适用于.NET 4.0):

async/await

注意,在这两种情况下,它仍然是一段返回Task对象的异步代码。以下是如何处理完成此类任务的示例:

Task DoNavigationAsync()
{
    // save the correct continuation context for Task.ContinueWith
    var continueContext = TaskScheduler.FromCurrentSynchronizationContext(); 

    bool documentComplete = false;
    TaskCompletionSource<bool> onloadTcs = null;

    WebBrowserDocumentCompletedEventHandler handler = delegate 
    {
        if (documentComplete)
            return; // attach to onload only once per each Document
        documentComplete = true;

        // now subscribe to DOM onload event
        this.wb.Document.Window.AttachEventHandler("onload", delegate
        {
            // each navigation has its own TaskCompletionSource
            if (onloadTcs.Task.IsCompleted)
                return; // this should not be happening

            // signal the completion of the page loading
            onloadTcs.SetResult(true);
        });
    };

    // register DocumentCompleted handler
    this.wb.DocumentCompleted += handler;

    // Navigate to http://www.example.com?i=1
    documentComplete = false;
    onloadTcs = new TaskCompletionSource<bool>();
    this.wb.Navigate("http://www.example.com?i=1");

    return onloadTcs.Task.ContinueWith(delegate 
    {
        // the document has been fully loaded, you can access DOM here
        MessageBox.Show(this.wb.Document.Url.ToString());

        // Navigate to http://example.com?i=2
        // could do the 'click()' simulation instead

        documentComplete = false;
        onloadTcs = new TaskCompletionSource<bool>(); // new task for new navigation
        this.wb.Navigate("http://example.com?i=2");

        onloadTcs.Task.ContinueWith(delegate 
        {
            // the document has been fully loaded, you can access DOM here
            MessageBox.Show(this.wb.Document.Url.ToString());

            // no more navigation, de-register DocumentCompleted handler
            this.wb.DocumentCompleted -= handler;
        }, continueContext);

    }, continueContext);
}

这里使用TAP pattern的好处是private void Form1_Load(object sender, EventArgs e) { DoNavigationAsync().ContinueWith(_ => { MessageBox.Show("Navigation complete!"); }, TaskScheduler.FromCurrentSynchronizationContext()); } 是一种独立的独立方法。它可以被重用,并且不会干扰父对象的状态(在这种情况下,是主窗体)。