使用C#获取div的内容“class in”

时间:2012-09-15 01:41:42

标签: c# html web-scraping

如何使用C#获取div或更多类in的内容?

我有以下HTML代码:

<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta charset="utf-8" />
    <title></title>
</head>
<body>
    <div id="xxx">
        <div class="in">
            <a href="/a/show/7184569" class="mm">ВАЗ 2121</a> <span class="for">за</span>
            <span class="price">2 700 $</span>
            <br />
            <span class="year">1990 г.</span><br />
            <div style="margin: 3px 0 3px 0">contentxxx</div>
        </div>
    </div>
</body>
</html>

我想获得div class="in"的内容,结果是:

<div class="in">
     <a href="/a/show/7184569" class="mm">ВАЗ 2121</a> <span class="for">за</span>
     <span class="price">2 700 $</span>
     <br />
     <span class="year">1990 г.</span><br />
     <div style="margin: 3px 0 3px 0">contentxxx</div>
</div>

3 个答案:

答案 0 :(得分:2)

using HtmlAgilityPack;

static void Parse
        {


            HtmlWeb web = new HtmlWeb();
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(getHTML());

            HtmlNodeCollection nodeCol = doc.DocumentNode.SelectNodes("//div[@class=\"in\"]");

            string value = nodeCol[0].InnerHtml;
        }

        static string getHTML()
        {
            string retVal = "";

            retVal = @"<!DOCTYPE html>"
                     + "<html lang=\"en\" xmlns=\"http://www.w3.org/1999/xhtml\">"
                    + "<head>"
                        + "<meta charset=\"utf-8\" />"
                        + "<title></title>"
                    + "</head>"
                    + "<body>"
                        + "<div id=\"xxx\">"
                            + "<div class=\"in\">"
                                + "<a href=\"/a/show/7184569\" class=\"mm\">ВАЗ 2121</a> <span class=\"for\">за</span>"
                                + "<span class=\"price\">2 700 $</span>"
                                + "<br />"
                                + "<span class=\"year\">1990 г.</span><br />"
                                + "<div style=\"margin: 3px 0 3px 0\">contentxxx</div>"

                            + "</div>"
                        + "</div>"
                    + "</body>"
                    + "</html>";

            return retVal;
        }

请添加名称空间HtmlAgilityPack; 参考:http://htmlagilitypack.codeplex.com/releases/view/90925

答案 1 :(得分:0)

您可以使用HTML Agility Pack轻松完成:

using HtmlAgilityPack;

...
var doc = new HtmlDocument();
doc.Load(@"C:\file.htm") //see the overloads. You can also use `LoadHtml` method.

var node = doc.DocumentNode.SelecSingleNode("//div[@class='in']");

//This is the text you are looking for...
var result = node.OuterHtml;

答案 2 :(得分:-2)

使用JQuery获取div的内容:

<script language="text/javascript">

       var d = $('div.in').html();
</script>

上面的代码获取了包含in类的div的内容。