麻烦在不一致时抓取html

时间:2015-05-22 19:51:38

标签: c# html csquery

我是csquery的新手,我在抓取html时遇到问题,如下所示:

<li id="Ingredient">
    <span id="Amount" class="ingredient-amount">1 pound</span>
    <span id="Name" class="ingredient-name">sweet Italian Sausage
</li>
<li id="Ingredient">
    <span id="Amount" class="ingredient-amount">3/4 pound</span>
    <span id="Name" class="ingredient-name">lean ground beef</span>
</li>

我想取出span标签内的文本并按如下格式进行格式化:

1 pound sweet Italian sausage
3/4 pound lean ground beef

这是我的代码:

for (int i = 0; i < dom.Select("#Ingredient").Length; ++i) {
    if (dom.Select("#Ingredient span#Amount")[i] != null)
            Console.WriteLine(dom.Select("#Ingredient span#Amount")[i].InnerHTML + " ");
    if (dom.Select("#Ingredient span#Name")[i] != null)
            Console.WriteLine(dom.Select("#Ingredient span#Name")[i].InnerHTML);
    Console.WriteLine(Environment.NewLine);
}

上面的html工作正常,但是当缺少其中一个跨度时会出现问题。例如,如果html中缺少<span id="lblIngName" class="ingredient-name">sweet Italian sausage</span>,我的代码将返回:

1 pound lean ground beef
3/4 pound

如您所见,lean ground beef上升了。我希望不惜一切代价与3/4 pound说。而1 pound可以保持孤立。 我怎样才能做到这一点? 我尝试了很多方法,但它没有用。所以我想做一些事情: for each "#Ingredient" write the "#Amount" if it exists or "#Name" if it exists. Do not bother with things on another Ingredient

0 个答案:

没有答案