如何使用c#解析html文档

时间:2016-12-15 09:27:33

标签: c# html html-parsing html-agility-pack

我必须按如下方式解析文档。我正在尝试HtmlAgilityPack,但它非常复杂。我需要这个标签内部文字:<td style="background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;">Mac Bahsi</td>和儿童

<div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;11.25;1;Maç Bahsi;164518117')">
<div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;6.50;0;Maç Bahsi;164518117')">,
<div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;1.18;2;Maç Bahsi;164518117')">

<!DOCTYPE HTML>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <style>
        .table1 {
            width: 100%;
            margin: 0px;
            padding: 0px;
            border-collapse: collapse;
            padding: 0px;
        }

        .div1 {
            cursor: pointer;
            margin: 1px;
            border: 1px solid #999999;
            float: left;
            font-size: 12px;
        }

        .td1 {
            text-align: center;
            font-size: 20px;
            font-weight: bold;
            color: #33460E;
            height: 20px;
            padding: 0px;
        }

        .td2 {
            text-align: center;
            font-weight: bold;
            color: #808000;
            padding: 0px;
        }
    </style>
</head>
<body style="background: #FFFFCC;margin: 0px;padding: 0px;font-size: 12px;">
    <p></p>
    <table style="width: 100%" cellpadding="0" cellspacing="0">
        <tr>
            <td style="background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;">Mac Bahsi</td>
        </tr>
        <tr>
            <td>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;11.25;1;Maç Bahsi;164518117')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">11.25</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Club America Mexico</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;6.50;0;Maç Bahsi;164518117')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">6.50</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Beraberlik</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;1.18;2;Maç Bahsi;164518117')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">1.18</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Real Madrid</td>
                        </tr>
                    </table>
                </div>
            </td>
        </tr>
    </table>
    <table style="width: 100%" cellpadding="0" cellspacing="0">
        <tr>
            <td style="background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;">Ilk Yari Bahsi</td>
        </tr>
        <tr>
            <td>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518128;;;-;8.50;1;İlk Yarı Bahsi;164518128')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">8.50</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Club America Mexico</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518128;;;-;3.05;0;İlk Yarı Bahsi;164518128')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">3.05</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Beraberlik</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518128;;;-;1.50;2;İlk Yarı Bahsi;164518128')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">1.50</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Real Madrid</td>
                        </tr>
                    </table>
                </div>
            </td>
        </tr>
    </table>
    <table style="width: 100%" cellpadding="0" cellspacing="0">
        <tr>
            <td style="background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;">İkinci Yarı Bahsi</td>
        </tr>
        <tr>
            <td>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518133;;;-;8.50;1;İkinci Yarı Bahsi;164518133')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">8.50</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Club America Mexico</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518133;;;-;3.70;0;İkinci Yarı Bahsi;164518133')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">3.70</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Beraberlik</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518133;;;-;1.40;2;İkinci Yarı Bahsi;164518133')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">1.40</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Real Madrid</td>
                        </tr>
                    </table>
                </div>
            </td>
        </tr>
    </table>
    <br />
    <br />
    <br />
</body>
</html>

2 个答案:

答案 0 :(得分:0)

你可以使用类似的东西:

var document = new HtmlDocument();
document.LoadHtml(text);
var tables = document.Descendants("table").ToList();
foreach (var table in tables)
{
    var node = HtmlNode.CreateNode(table.InnerHtml);

    var td = node.SelectNodes("//td[@style='background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;'").FirstOrDefault();
    ...
    var divs =  node.SelectNodes("//div[@class='div1']").ToList();
    ...
}

答案 1 :(得分:0)

我是这样做的。但这是一个很长的路。如果有更好的捷径和更好的方法,请写下。

            HtmlWeb h = new HtmlWeb();
            HtmlDocument doc = h.Load(Server.MapPath("xml/htmlpage.html"));
            HtmlNodeCollection n = doc.DocumentNode.SelectNodes("//html/body/table");

            string item;
            string[] items;
            string oran, oranadi;
            int oran_id, secim;
            for (int i = 1; i < n.Count + 1; i++)
            {
                HtmlNode ns = n[i - 1].SelectSingleNode(string.Format("//html/body/table[{0}]/tr[1]/td", i));
                HtmlNodeCollection nc = n[i-1].SelectNodes(string.Format("//html/body/table[{0}]/tr[2]/td[1]/div", i));
                Response.Write(string.Format("{0} --> {1}<br/>", i, ns.InnerHtml));
                for (int j = 1; j < nc.Count + 1; j++)
                {
                    HtmlNode ncs = nc[j - 1].SelectSingleNode(string.Format("//html/body/table[{0}]/tr[2]/td[1]/div[{1}]", i, j));
                    item = ncs.Attributes[2].Value.ToString();
                    items = item.Split(';');
                    oran_id = Convert.ToInt32(items[7].Replace("')", ""));
                    oranadi = items[6].ToString();
                    secim = Convert.ToInt32(items[5]);
                    oran = items[4];

                    Response.Write(string.Format("{0} --> {1} - {2} - {3} - {4} <br/>", j, secim, oran_id, oranadi, oran));
                }
            }