从线程返回时,WebBrowser.Document为null

时间:2015-04-14 06:10:07

标签: c# web-scraping

我正在尝试使用此代码进行一些网页编写:

public static User registerUser()
    {
        User toreturn = new User();

        string csrf;

        WebBrowser webcontrol = new WebBrowser();

        webcontrol.AllowNavigation = true;
        webcontrol.ScriptErrorsSuppressed = true;
        webcontrol.Navigate("https://example.com/signup");
        webcontrol.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webcontrol_DocumentCompleted);

        HtmlElementCollection forms = webcontrol.Document.GetElementById("csrf_token").GetElementsByTagName("value");

        string tosend = forms[0].InnerText;
        toreturn.apikey = tosend;
        return toreturn;
    }

    private static void webcontrol_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {

    }

但是,整个变量webcontrol.Document在事件之前和之后都是null。

知道为什么会这样吗?失明时很难刮伤。

编辑:值得注意的是,显然我需要使用我得到的值

所以现在我已经通过下面的帮助加载了它,但我无法让线程返回值..

public static User registerUser()
    {
        Uri test = new Uri("https://www.example.com/signup");
        HtmlDocument testdoc = runBrowserThread(test);

        string tosend = "test";

        User user = new User();

        user.apikey = tosend;

        return user;

    }
    public static HtmlDocument runBrowserThread(Uri url)
    {
        HtmlDocument value = null;
        var th = new Thread(() =>
        {
            var br = new WebBrowser();
            br.DocumentCompleted += browser_DocumentCompleted;
            br.Navigate(url);
            value = br.Document;
            Application.Run();
        });
        th.SetApartmentState(ApartmentState.STA);
        th.Start();
        th.Join(8000); 
        return value;
    }

    static void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        var br = sender as WebBrowser;
        if (br.Url == e.Url)
        {
            Console.WriteLine("Natigated to {0}", e.Url);
            Console.WriteLine(br.Document.Body.InnerHtml);
            System.Console.ReadLine();
            Application.ExitThread();   // Stops the thread
        }
    }

System.Console.WriteLine有效 - 我看到了HTML!喜悦! (虽然它的云雾,但我应该被列入白名单)

但是线程返回null ..

0 个答案:

没有答案