从头到尾读取字符串中的数据

时间:2012-03-07 09:33:07

标签: c# html-agility-pack

<td valign="top" class="m92_h_bigimg">
    <a href="#pagetop" onclick="drop1('hotelinfos');" title="Hotelinformationen einblenden"><img border=0 src="http://i2.giatamedia.de/s.php?uid=168846&source=xml&size=320&vea=5vf&cid=2492&file=007399_8790757.jpg" name="bigpic"></a>
</td>
<td valign="top" class="m92_h_bigimg2">
    <table border=0 cellpadding=0 cellspacing=0>
        <tr>
          <td valign="top" class="m92_h_para">Hotel:</td>
          <td valign="top" class="m92_h_name">
                Melia Tropical              <br>
                <img src="/images/star.gif" height=13 width=13 alt="*"><img src="/images/star.gif" height=13 width=13 alt="*"><img src="/images/star.gif" height=13 width=13 alt="*"><img src="/images/star.gif" height=13 width=13 alt="*"><img src="/images/star.gif" height=13 width=13 alt="*">             
            </td>
      </tr>
        <tr>
          <td valign="top" class="m92_h_para">Zimmer:</td>
            <td valign="top" class="m92_h_wert"><b>Suite</b></td>
      </tr>
        <tr>
          <td valign="top" class="m92_h_para">Verpflegung:</td>
          <td valign="top" class="m92_h_wert"><b>All Inclusive</b></td>
      </tr>
      <tr>
          <td valign="top" class="m92_h_para">Ort:</td>
            <td valign="top" class="m92_h_wert">Punta Cana</td>
      </tr>
        <tr>
          <td valign="top" class="m92_h_para">Region:</td>
          <td valign="top" class="m92_h_wert">Punta Cana</td>
      </tr>
        <tr>
            <td valign="top" class="m92_h_para">Land:</td>
          <td valign="top" class="m92_h_wert">Dom. Republik</td>
        </tr>
        <tr>
          <td valign="top" class="m92_h_para">Anbieter:</td>
          <td valign="top" class="m92_h_wert"><a href="javascript:VA('5VF');"><img border=0 src="http://www.lmweb.net/lmi/va/gifs/5VF.gif" alt="5 vor Flug" title="5 vor Flug"></a><br>5 vor Flug</td>
      </tr>
    </table>
    <table border=0 cellpadding=0 cellspacing=0>
        <tr>
          <td><img src="/images/dropleftw.gif" height="16" width="18"></td>
            <td>
                <div id="mark" class="m92_notice">

                        &nbsp;<a target="vakanz" href="siteplus/reminder.php?session_id=rslr1ejntpmj07n0f2smqfhsj5&REC=147203&m_flag=1&m_typ=hotel">Dieses Hotel merken</a>
        </div>
          </td>
      </tr>
      <tr>
          <td><img src="/images/dropleftw.gif" height="16" width="18"></td>
            <td>
            <div class="m92_notice">
            &nbsp;<a href="#pagetop" onclick="drop1('hotelinfos'); drop1('hoteltermine');">Hotelbewertung anzeigen</a>
            </div>
          </td>
      </tr>
    </table>
</td>

使用HtmlAgility-pack,如何在<td valign="top" class="m92_h_bigimg">和结束<td>之间获取数据。我尝试使用此代码而不使用HtmlAgility-pack,但这有效,但它首先发现</td>并关闭。所以代码不正确。我读到HtmlAgility-pack是解决此类问题的最佳解决方案。

public static string[] GetStringInBetween(string strBegin, string strEnd, string strSource, bool includeBegin, bool includeEnd)
{
    string[] result = { "", "" };
    int iIndexOfBegin = strSource.IndexOf(strBegin, StringComparison.Ordinal);

    if (iIndexOfBegin != -1)
    {
        int iEnd = strSource.IndexOf(strEnd, iIndexOfBegin, StringComparison.Ordinal);

        if (iEnd != -1)
        {
            result[0] = strSource.Substring(iIndexOfBegin + (includeBegin ? 0 : strBegin.Length), iEnd + (includeEnd ? strEnd.Length : 0) - iIndexOfBegin);

            if (iEnd + strEnd.Length < strSource.Length)
                        result[1] = strSource.Substring(iEnd + strEnd.Length);
        }
    }
    return result;
}

我该怎么做?

2 个答案:

答案 0 :(得分:1)

HtmlAgilityPack支持标准的XPath查询,所以我认为你可以这样做:

foreach (var node in doc.DocumentElement.SelectNodes("//td[@class='m92_h_bigimg']"))
{
    // Do work on your node.
}

...其中docHtmlDocument

的实例

答案 1 :(得分:1)

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(html);
var str = htmlDoc.DocumentNode
    .Descendants("td")
    .Where(x => x.Attributes["class"] != null && x.Attributes["class"].Value == "m92_h_bigimg")
    .Select(x => x.InnerHtml)
    .First();