从html源文件中提取文本的值

时间:2012-11-18 16:40:48

标签: c# asp.net lambda

在此代码中,var TempTxt包含一个Html Body Content 作为字符串 如何使用lambada语法提取元素<table><td>内部text / html?

    public  string  ExtractPageValue(IWebDriver DDriver, string url="") 
    {
        if(string.IsNullOrEmpty(url))
        url = @"http://www.boi.org.il/he/Markets/ExchangeRates/Pages/Default.aspx";
        var service = InternetExplorerDriverService.CreateDefaultService(directory);
        service.LogFile = directory + @"\seleniumlog.txt";
        service.LoggingLevel = InternetExplorerDriverLogLevel.Trace;

        var options = new InternetExplorerOptions();
        options.IntroduceInstabilityByIgnoringProtectedModeSettings = true;

        DDriver = new InternetExplorerDriver(service, options, TimeSpan.FromSeconds(60));
        DDriver.Navigate().GoToUrl(url);
        var TempTxt = DDriver.PageSource;
        return "";//Math.Round(Convert.ToDouble( TempTxt.Split(' ')[10]),2).ToString();

    }

1 个答案:

答案 0 :(得分:1)

如果您愿意尝试HtmlAgilityPack

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var table = doc.DocumentNode.SelectNodes("//table/tr")
               .Select(tr => tr.Elements("td").Select(td => td.InnerText).ToList())
               .ToList();