使用C#刮取节点属性

时间:2016-08-18 14:40:52

标签: c# list web-scraping html-agility-pack

我正在使用C#和Htmlagilitypack从网站上抓取数据。 我几乎得到了我想要的东西,请看截图:

enter image description here

这是我的HTML代码:

 <tr class="gtitle"><th colspan="11" class="nobr">Next matches</th></tr>
        <tr class="rtitle first-row"><th class="left first-cell nobr">&nbsp;</th><th class="nobr">&nbsp;</th><th class="nobr">&nbsp;</th><th>&nbsp;</th><th class="bs" title="Number of Bookies">B's</th><th>1</th><th>X</th><th>2</th><th class="col-time nobr">&nbsp;</th><th class="space nobr">&nbsp;</th><th class="col-score last-cell nobr">&nbsp;</th></tr>
        <tr class="match-line first-row"><td class="tl first-cell nobr match-day-after" title="Day after tomorrow match"></td><td class="tl nobr"><a href="/soccer/norway/tippeligaen/valerenga-bodo-glimt/KdDafkm4/">Valerenga - Bodo/Glimt</a></td><td class="livebet nobr">&nbsp;</td><td class="tv">&nbsp;</td><td class="bs">25</td><td class="odds"><span><a href="/my_selections.php?action=3&amp;matchid=KdDafkm4&amp;outcomeid=2aeqnxv464x0x4s2rj&amp;otheroutcomes=2aeqnxv498x0x0,2aeqnxv464x0x4s2rk" onclick="return my_selections_click(this);" title="Add to My Selections" target="mySelections" class="mySelectionsTip" data-odd="1.52"></a></span></td><td class="odds"><span><a href="/my_selections.php?action=3&amp;matchid=KdDafkm4&amp;outcomeid=2aeqnxv498x0x0&amp;otheroutcomes=2aeqnxv464x0x4s2rj,2aeqnxv464x0x4s2rk" onclick="return my_selections_click(this);" title="Add to My Selections" target="mySelections" class="mySelectionsTip" data-odd="4.09"></a></span></td><td class="odds"><span><a href="/my_selections.php?action=3&amp;matchid=KdDafkm4&amp;outcomeid=2aeqnxv464x0x4s2rk&amp;otheroutcomes=2aeqnxv464x0x4s2rj,2aeqnxv498x0x0" onclick="return my_selections_click(this);" title="Add to My Selections" target="mySelections" class="mySelectionsTip" data-odd="5.80"></a></span></td><td class="last-cell nobr right" colspan="3">19.08.2016 19:00</td></tr>

<tr class="match-line strong"><td class="tl first-cell nobr"></td><td class="tl nobr"><a href="/soccer/norway/tippeligaen/lillestrom-haugesund/0htZwU2N/">Lillestrom - Haugesund</a></td><td class="livebet nobr">&nbsp;</td><td class="tv">&nbsp;</td><td class="bs">24</td><td class="odds"><span><a href="/my_selections.php?action=3&amp;matchid=0htZwU2N&amp;outcomeid=2aeqhxv464x0x4s2r7&amp;otheroutcomes=2aeqhxv498x0x0,2aeqhxv464x0x4s2r8" onclick="return my_selections_click(this);" title="Add to My Selections" target="mySelections" class="mySelectionsTip" data-odd="2.34"></a></span></td><td class="odds"><span><a href="/my_selections.php?action=3&amp;matchid=0htZwU2N&amp;outcomeid=2aeqhxv498x0x0&amp;otheroutcomes=2aeqhxv464x0x4s2r7,2aeqhxv464x0x4s2r8" onclick="return my_selections_click(this);" title="Add to My Selections" target="mySelections" class="mySelectionsTip" data-odd="3.40"></a></span></td><td class="odds"><span><a href="/my_selections.php?action=3&amp;matchid=0htZwU2N&amp;outcomeid=2aeqhxv464x0x4s2r8&amp;otheroutcomes=2aeqhxv464x0x4s2r7,2aeqhxv498x0x0" onclick="return my_selections_click(this);" title="Add to My Selections" target="mySelections" class="mySelectionsTip" data-odd="2.83"></a></span></td><td class="last-cell nobr right" colspan="3">20.08.2016 15:30</td></tr>

我的问题是两个: 1)我应该将MatchNM列中的数据拆分为HomeNM和HostNM 2)我应该从note属性“data-odd”获取值,并将它们放入odd1NM,oddXNM和odd2NM。

这是我写的代码:

Form1中:

var url1 = "http://www.betexplorer.com/soccer/norway/tippeligaen/";

    var web1 = new HtmlWeb();
    var doc1 = web1.Load(url1);

    BetsNM = new List<NextMatch>();



    // Lettura delle righe
    var Rows = doc1.DocumentNode.SelectNodes("//tr");

    foreach (var row in Rows)
    {
        if (!row.GetAttributeValue("class", "").Contains("rtitle"))
        {
            if (string.IsNullOrEmpty(row.InnerText))
                continue;

            var rowBetNM = new NextMatch();
            foreach (var node in row.ChildNodes)
            {
                var data_odd1 = node.GetAttributeValue("data-odd", "");

                if (string.IsNullOrEmpty(data_odd1))
                {
                    if (node.GetAttributeValue("class", "").Contains("tl"))
                    {
                        rowBetNM.MatchNM = node.InnerText.Trim();
                        var matchTeamNM = rowBetNM.MatchNM.Split(new[] { " - " }, StringSplitOptions.RemoveEmptyEntries);
                        //rowBetNM.HomeNM = matchTeamNM[0];
                        //rowBetNM.HostNM = matchTeamNM[1];
                    }


                    if (node.GetAttributeValue("class", "").Contains("last-cell"))
                        rowBetNM.DateNM = node.InnerText.Trim();

                }
                else
                {
                    rowBetNM.OddsNM.Add(data_odd1);
                }
            }

            if (!string.IsNullOrEmpty(rowBetNM.MatchNM ))
                BetsNM.Add(rowBetNM);
        }
    }

NextMatch.cs

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace bexscraping
{
    class NextMatch
    {

        public string MatchNM { get; set; }
        public List<string> OddsNM { get; set; }
        public string DateNM { get; set; }
        public string HomeNM { get; set; }
        public string HostNM { get; set; }



        public string odd1NM { get; set; }
        public string oddXNM { get; set; }
        public string odd2NM { get; set; }

        public NextMatch()
        {
            OddsNM = new List<string>();

        }

        public override string ToString()
        {
            String MatchInfo = String.Format("{0}: {1} -> {2}", DateNM, MatchNM);
            String OddsInfo = String.Empty;
            foreach (string d in OddsNM)
                OddsInfo += " | " + d;

            return MatchInfo + "\n" + OddsInfo;
        }

    }



}

我真的不明白问题出在哪里。有人可以帮帮我吗?谢谢!

编辑再次检查我的帖子,我做了一些更正

0 个答案:

没有答案