使用hmtlagilitypack从由div组成的网页表中提取值和标签

时间:2020-03-29 21:24:37

标签: c# web-scraping html-agility-pack

有一个网页,我想在其中提取表中的标签和值,但是用通常的“ tr”和“ td” html标记不方便定义该表,而是使用“ div类” ..“我似乎找不到这种情况的代码示例。以下是构成我所要使用的表的HTML代码片段。

<div ng-controller="stockDetailSummary" ng-cloak>
    <div class="stockDetail" ng-show="quoteFound">
        <div ng-show="DataLoaded">

             <h2>Stock Scorecard</h2>
                <div class="Column Left">
            <div class="row">
                <div class="label">Market Cap</div>
                <div class="value" ng-if="quoteObj.currentMarketCap">{{quoteObj.currentMarketCap}}</div>
                <div class="value" ng-if="!quoteObj.currentMarketCap">- -</div>
            </div>
            <div class="row alt">
                <div class="label">Yield</div>
                <div class="value" ng-if="quoteObj.lastDividentReported">{{quoteObj.lastDividentReported | number:2 }}%<span ng-if="quoteObj.yield">%</span></div>
                <div class="value" ng-if="!quoteObj.lastDividentReported">- -</div>
            </div>  
            <div class="row">
                <div class="label">Quarterly Dividend</div>
                <div class="value" ng-if="quoteObj.dividend">{{quoteObj.dividend | number:2 }}</div>
                <div class="value" ng-if="!quoteObj.dividend">- -</div>
            </div>
            <div class="row alt">
                <div class="label">Open</div>
                <div class="value" ng-if="quoteObj.open">{{quoteObj.open | number:2 }}</div>
                <div class="value" ng-if="!quoteObj.open">- -</div>
            </div>
            <div class="row">
                <div class="label">High</div>
                <div class="value" ng-if="quoteObj.high">{{quoteObj.high | number:2 }}</div>
                <div class="value" ng-if="!quoteObj.high">- -</div>
            </div>  
            <div class="row alt">
                <div class="label">Low</div>
                <div class="value" ng-if="quoteObj.low">{{quoteObj.low | number:2 }}</div>
                <div class="value" ng-if="!quoteObj.low">- -</div>
            </div>
            <div class="row">
                <div class="label">Previous Close</div>
                <div class="value" ng-if="quoteObj.previousClosePrice">{{quoteObj.previousClosePrice | number:2}}</div>
                <div class="value" ng-if="!quoteObj.previousClosePrice">- -</div>
            </div>
            <div class="row alt">
                <div class="label">Volume</div>
                <div class="value" ng-if="quoteObj.totalVolume">{{quoteObj.totalVolume | number}}</div>
                <div class="value" ng-if="!quoteObj.totalVolume">- -</div>
            </div>
            <div class="row">
                <div class="label">52-Week High</div>
                <div class="value" ng-if="quoteObj.yearHigh">{{quoteObj.yearHigh | number:2 }}</div>
                <div class="value" ng-if="!quoteObj.yearHigh">- -</div>
            </div>
            <div class="row alt">
                <div class="label">52-Week Low</div>
                <div class="value" ng-if="quoteObj.yearLow">{{quoteObj.yearLow | number:2 }}</div>
                <div class="value" ng-if="!quoteObj.yearLow">- -</div>
            </div>
        </div>
        <div class="Column Right">
            <div class="row">
                <div class="label">1 Year Total % Return</div>
                <div class="value" ng-if="quoteObj.oneYearReturn">{{quoteObj.oneYearReturn | number:2}}</div>
                <div class="value" ng-if="!quoteObj.oneYearReturn">- -</div>
            </div>
            <div class="row alt">
                <div class="label">Earnings Per Share</div>
                <div class="value" ng-if="quoteObj.eps">{{quoteObj.eps | number:2 }}</div>
                <div class="value" ng-if="!quoteObj.eps">- -</div>
            </div>
            <div class="row">
                <div class="label">Price/Earnings Ratio</div>
                <div class="value"ng-if="quoteObj.peRatio">{{quoteObj.peRatio | number:2 }}</div>
                <div class="value" ng-if="!quoteObj.peRatio">- -</div>
            </div>
            <div class="row alt">
                <div class="label">Price-to-Book Ratio</div>
                <div class="value" ng-if="quoteObj.priceToBookRatio">{{quoteObj.priceToBookRatio | number:2}}</div>
                <div class="value" ng-if="!quoteObj.priceToBookRatio">- -</div>
            </div>
            <div class="row">
                <div class="label">Price-to-Sales Ratio</div>
                <div class="value" ng-if="quoteObj.priceToSalesRatio">{{quoteObj.priceToSalesRatio | number:2}}</div>
                <div class="value" ng-if="!quoteObj.priceToSalesRatio">- -</div>
            </div>
            <div class="row alt">
                <div class="label">Shares Outstanding</div>
                <div class="value" ng-if="quoteObj.shareOutstanding">{{quoteObj.shareOutstanding | number:2}} M</div>
                <div class="value" ng-if="!quoteObj.shareOutstanding">- -</div>
            </div>
            <div class="row">
                <div class="label">30 Day Average Volume</div>
                <div class="value" ng-if="quoteObj.thirtyDayAvgVolume">{{quoteObj.thirtyDayAvgVolume}}</div>
                <div class="value" ng-if="!quoteObj.thirtyDayAvgVolume">- -</div>
            </div>
            <div class="row alt">
                <div class="label">Exchange</div>
                <div class="value" ng-if="quoteObj.exchange">{{quoteObj.exchange}}</div>
                <div class="value" ng-if="!quoteObj.exchange">- -</div>
            </div>
            <div class="row">
                <div class="label">Currency</div>
                <div class="value" ng-if="quoteObj.currency">{{quoteObj.currency}}</div>
                <div class="value" ng-if="!quoteObj.currency">- -</div>
            </div>
            <div class="row alt">
                <div class="label">Expected Reporting Date</div>
                <div class="value" ng-if="quoteObj.expectedReportDate">{{quoteObj.expectedReportDate | date:'MMM d, yyyy'}}</div>
                <div class="value" ng-if="!quoteObj.expectedReportDate">- -</div>
            </div>
        </div>
        <div class="clear"></div>
    </div>      
    </div>
    <div class="not-found" ng-show="!quoteFound">
        <p>There was a problem retrieving the data</p>
    </div>
</div>

您也可以在Bloomberg Stock Quote for ENB:CT

进行查看

我找到了我要查找的表的xpath,并尝试了以下代码:

            HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
            HtmlAgilityPack.HtmlDocument jkdoc = web.Load("https://www.bnnbloomberg.ca/stock/ENB:CT");
            HtmlNode jnode = jkdoc.DocumentNode.SelectSingleNode("/html[1]/body[1]/div[3]/div[3]/section[1]/div[3]/div[2]/div[3]/div[1]/div[1]/div[1]/div[1]/div[1]/table[1]");
            string value = jnode.InnerText;
 Console.WriteLine(value);

屏幕上显示的字符串包含各个部分的标签,但没有实际值。相反,我得到类似{{revenue | replaceEmptyValue}}

这是否是一种计时问题,我必须等待页面完全加载并填充值吗?

1 个答案:

答案 0 :(得分:1)

该表似乎只是Angular JavaScript框架的模板。这意味着需要执行相关的脚本,以使其正确地呈现并填充数据。 HtmlAgilityPack只会解析直接从服务器接收的HTML,这意味着未呈现的模板标记。

This question基本上可以归结为同一件事,并可能在正确的方向上为您提供帮助。