如何在文本文件中选择某些“节点” - 基于某行使用 HTMLAgilityPack 包含的内容?

时间:2021-02-15 08:01:30

标签: c# html parsing html-agility-pack

我知道标题说了很多,别担心,我会为你分解。 好的,所以我有一个 .txt 文件,其中包含“Horsemen”字样,名为 TeamName.txt 我还有另外 6 个带有 HTML 代码的 .txt 文件,我的代码可以获取和下载这些文件 - 这称为 Ladder-1-100.txt - 现在!简单的部分:

想法是这样的,代码在 HTML 阶梯.txt 文件中筛选团队名称,我的代码现在可以正常工作。但是,我也希望它在特定的 @class 中提取其他信息。我的解释不够具体?我给你看。

<tr class="vrml_table_row">
    <td class="pos_cell">59</td>
    <td class="div_cell"><img src="/images/div_gold_40.png" title="Gold" /></td>
    <td class="team_cell"><a href="/EchoArena/Teams/RHNkUmJMV1p5UEU90" class="team_link"><img src="/images/logos/teams/9b3b1917-a56b-40a3-80ee-52b1c9f31910.png" class="team_logo" /><span class="team_name">Echoholics</span></a></td>
    <td class="group_cell"><img src="/images/group_ame.png" class="group_logo" title="America East" /></td>
    <td class="gp_cell">14</td>
    <td class="win_cell">10</td>
    <td class="loss_cell">4</td>
    <td class="pts_cell">340</td>
    <td class="mmr_cell"><span>1200</span></td>
</tr>
<tr class="vrml_table_row">
    <td class="pos_cell">60</td>
    <td class="div_cell"><img src="/images/div_diamond_40.png" title="Diamond" /></td>
    <td class="team_cell"><a href="/EchoArena/Teams/cUJmVGlKajFGRlE90" class="team_link"><img src="/images/logos/teams/dff8310a-a429-4c60-af82-0333d530d22d.png" class="team_logo" /><span class="team_name">Horsemen</span></a></td>
    <td class="group_cell"><img src="/images/group_aa.png" class="group_logo" title="Oceania/Asia" /></td>
    <td class="gp_cell">10</td>
    <td class="win_cell">6</td>
    <td class="loss_cell">4</td>
    <td class="pts_cell">235</td>
    <td class="mmr_cell"><span>1200</span></td>
</tr>
<tr class="vrml_table_row">
    <td class="pos_cell">61</td>
    <td class="div_cell"><img src="/images/div_gold_40.png" title="Gold" /></td>
    <td class="team_cell"><a href="/EchoArena/Teams/UDd1dTJQRzBiRzQ90" class="team_link"><img src="/images/logos/teams/8eb6109e-f765-4d64-a766-cc5605a01ad0.png" class="team_logo" /><span class="team_name">Femboys</span></a></td>
    <td class="group_cell"><img src="/images/group_ame.png" class="group_logo" title="America East" /></td>
    <td class="gp_cell">12</td>
    <td class="win_cell">8</td>
    <td class="loss_cell">4</td>
    <td class="pts_cell">348</td>
    <td class="mmr_cell"><span>1200</span></td>
</tr>

这是我当前会吐出的代码:团队名称:Horsemen。

                HtmlNode[] team_name = document1.DocumentNode
                    .SelectSingleNode("//*[@class='vrml_table_row']")
                    .SelectNodes("//td[@class='team_cell']")
                    .Where(x => x.InnerHtml.Contains($"{TeamName}"))
                    .ToArray();

                foreach (HtmlNode item in team_name)
                {
                    await ReplyAsync("**Team Name:** " + item.InnerHtml);
                }

但是,我想让它吐出来: 球队名称:Horsemen,胜:6,输:4,出场次数:10,MMR:1200,得分:235,分区:Diamond,阶梯位置:60。

你明白我的意思。正如您所看到的,这些类中的每一个都被标记为相同的,期待它们内部的信息。顺便说一下,团队名称 - Horsemen - 是动态的,这意味着它可以替换为另一个团队名称。那么我如何实现这一目标?

1 个答案:

答案 0 :(得分:1)

示例解决方案如下:

首先创建一个Model类

class Model
{
    public int Position { get; set; }
    public string TeamName { get; set; }
    public string ImageSource { get; set; }
    public string Division { get; set; }
    //whatever you want to store
}

之后应该将所需的节点保留在 HtmlNodeCollection 中,并将我们的模型保留在列表中:

var table = htmlDoc.DocumentNode.SelectNodes("//tr[contains(@class, 'vrml_table_row')]");
var models = new List<Model>();
foreach (var t in table)
{
    var model = new Model
    {
       //I used the first 8 columns of the desired table
        Position = int.Parse(t.SelectSingleNode("td[contains(@class, 'pos_cell')]").InnerText),
        ImageSource = t.SelectSingleNode("td[contains(@class, 'div_cell')]/img").Attributes["src"].Value,
        Division = t.SelectSingleNode("td[contains(@class, 'div_cell')]/img").Attributes["title"].Value,
        TeamLink = t.SelectSingleNode("td[contains(@class, 'team_cell')]/a").Attributes["href"].Value,
        TeamLogo = t.SelectSingleNode("td[contains(@class, 'team_cell')]/a/img").Attributes["src"].Value,
        TeamName = t.SelectSingleNode("td/a/span[contains(@class, 'team_name')]").InnerText,
        GroupLogo = t.SelectSingleNode("td[contains(@class, 'group_cell')]/img").Attributes["src"].Value,
        GroupTitle = t.SelectSingleNode("td[contains(@class, 'group_cell')]/img").Attributes["title"].Value
        // etc
     };
     models.Add(model);
}
相关问题