解析c#

时间:2018-06-02 08:04:32

标签: c# html parsing

嗨我有一个格式如下的html表:

<table id="archive_regulation" class="table table-striped table-hover">
    <thead>
    <tr>
        <th>row</th>
        <th>category</th>
        <th>title</th>
        <th>date</th>
        <th>type</th>
        <th>sub type</th>
        <th>download</th>
    </tr>
    </thead>
            <tr>
            <td>1</td>
            <td>test</td>
            <td>some thing for test</td>
            <td>۱۳۹۷/۰۳/۱۲</td>
            <td>general</td>
            <td>other</td>
            <td>
                <a class="btn btn-info" href="http://10.30.170.46/portal/fileLoader.php?code=91a1135b8898b8db76ada81527923329">
                    <span aria-hidden="true" class="glyphicon glyphicon-download"></span>
                    <b> download</b>
                </a>
            </td>
        </tr>
                <tr>
            <td>2</td>
            <td>something for test</td>
            <td>another thing for test</td>
            <td>۱۳۹۷/۰۳/۱۲</td>
            <td>vip</td>
            <td>new</td>
            <td>
                <a class="btn btn-info" href="http://10.30.170.46/portal/fileLoader.php?code=164f564d81e7f3cc3c18091527922677">
                    <span aria-hidden="true" class="glyphicon glyphicon-download"></span>
                    <b> download</b>
                </a>
            </td>
        </tr>
        </table>

现在我想得到这样的项目:

  

1
      测试
      一些测试用的东西
      1397年3月12日
      一般
      其他
      http://10.30.170.46/portal/fileLoader.php?code=91a1135b8898b8db76ada81527923329
      2
          用于测试的东西
          测试的另一件事
          1397年3月12日
          VIP
          新
          http://10.30.170.46/portal/fileLoader.php?code=164f564d81e7f3cc3c18091527922677

我使用了htmlagilitypack但模式不正确

 WebClient webClient = new WebClient();
        string page = webClient.DownloadString("http://5743.zanjan.medu.ir/regulation/archive?ocode=100038170");
        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(page);
        foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
        {
            Console.WriteLine("Found: " + table.Id);
            foreach (HtmlNode row in table.SelectNodes("tr"))
            {
                Console.WriteLine("row");
                foreach (HtmlNode cell in row.SelectNodes("th|td"))
                {
                    Console.WriteLine("cell: " + cell.InnerText);
                }
            }
        }

我该怎么办?

1 个答案:

答案 0 :(得分:0)

问题中的代码与您提出的问题有所不同。

以下代码会返回您要求的输出 请注意,您从中获取HTML的网址对我不起作用 (别忘了使用System.Linq)

WebClient webClient = new WebClient();            
string page = webClient.DownloadString("http://5743.zanjan.medu.ir/regulation/archive?ocode=100038170");

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(page);

var output = "";

var table = doc.GetElementbyId("archive_regulation");

foreach (HtmlNode td in table.Descendants("td"))
{
    var anchors = td.Descendants("a");

    if (anchors.Count() > 0)
        output += anchors.First().GetAttributeValue("href", null);
    else
        output += td.InnerText;

    output += "\n";
}

Console.WriteLine(output);