在.net中没有httpwebrequest的Web数据scraper

时间:2012-11-07 11:17:14

标签: c# .net

在.net中有一些使用IE组件的.net类。通过它们我们可以读取htmldom并进行登录和数据报废。 请允许任何机构给我那些存在的类/组件的名称。我自己用它来登录和数据报废很多回来

2 个答案:

答案 0 :(得分:1)

如果您想从现有网页抓取数据,请考虑HtmlAgilityPack

答案 1 :(得分:1)

我能够使用richard的。关于HTMLTableRowCollection的一些小问题是从HTMLTable中留下的。 我的代码就像是

        object o = null;
        InternetExplorer ie = new InternetExplorerClass();
        IWebBrowserApp wb = ie;
        wb.Visible = chkShowBrowser.Checked;
        wb.Navigate("http://LoginPage.aspx", ref o, ref o, ref o, ref o);

        do
        {
            Thread.Sleep(10000);
        } while (wb.Busy);

        if (ie.Document != null)
        {
            var myDoc = ie.Document as HTMLDocument;

            if (myDoc != null)
            {
                var oUserName = (HTMLInputTextElement)myDoc.getElementById("ctl00_MainBodyPlaceholder_PublicPortalLogin_UserName");
                oUserName.value = ConfigurationManager.AppSettings.Get("userName");

                var oPassword =
                    (HTMLInputTextElement)
                    myDoc.getElementById("ctl00_MainBodyPlaceholder_PublicPortalLogin_Password");
                oPassword.value = ConfigurationManager.AppSettings.Get("password");

                var btnSubmitLogin =
                    (HTMLInputElement)myDoc.getElementById("ctl00_MainBodyPlaceholder_PublicPortalLogin_Login");
                btnSubmitLogin.click();

                do
                {
                    Thread.Sleep(10000);
                } while (wb.Busy);


                if (ie.Document != null)
                {
                    wb.Navigate("http://SearchPage.aspx", ref o, ref o, ref o, ref o);


                    do
                    {
                        Thread.Sleep(10000);
                    } while (wb.Busy);



                    if (ie.Document != null)
                    {

                        var oIncidentNumber =
                            (HTMLInputTextElement)
                            myDoc.getElementById("ctl00_MainBodyPlaceholder_txtIncidentNumber");
                        oIncidentNumber.value = ConfigurationManager.AppSettings.Get("incidentNumber");

                        var btnTicketNumberSearch =
                            (HTMLInputElement)myDoc.getElementById("ctl00_MainBodyPlaceholder_btnSearch");
                        btnTicketNumberSearch.click();

                        do
                        {
                            Thread.Sleep(10000);
                        } while (wb.Busy);

                        HTMLTable searchResultTable = myDoc.getElementById("ctl00_MainBodyPlaceholder_gdView_DXMainTable") as HTMLTable;



                        if (searchResultTable != null)
                        {
                            //foreach (var VARIABLE in searchResultTable.T)
                            //{

                            //}
                        }

                        if (chkRenderBody.Checked)
                        {
                            txtFinalTextBox.Text = myDoc.body.outerHTML;
                        }
                    }
                }
            }
        }