通过1 - HtmlAgilityPack增加“代码”的有效方法

时间:2016-12-21 04:59:27

标签: c# wpf html-agility-pack visual-studio-2015

我正在开发一个从游戏页面中提取内容的应用<(strong> example ),在文本框中将其显示给用户,如果是用户希望这样做,他/她可以将其保存为.txt文件或.xsl(excel电子表格格式)。

但我现在面临的主要问题是您必须手动更改代码“提取”有关他人的数据游戏中的单位。

如果您打开链接,您会看到我正在提取“武器”“已使用”来自Defender方面的“幸存”“伤亡人数”(现在),但只有一种类型的单位(更像是该表的一行)被“提取” ,我正在寻找一种搜索“tr [1] / td [2] / span [1]”到“tr [45] / td [2] / span [1]”的方法(即使示例页面只有tr [16]),或者可能是一种自动搜索的方法,直到它找不到任何数据(没有),然后就会停止。

对于任何文字错误,我很抱歉,我不是母语人士

private void btnStart_Click(object sender, RoutedEventArgs e)
    {
        HtmlDocument brPage = new HtmlWeb().Load("http://us.desert-operations.com/world2/battleReport.php?code=f8d77b1328c8ce09ec398a78505fc465");
        HtmlNodeCollection nodes = brPage.DocumentNode.SelectNodes("/html[1]/body[1]/div[1]/div[1]/div[3]/div[1]/div[1]/div[1]/div[2]/table[2]");
        string result = "";
        List<brContentSaver> ContentList = new List<brContentSaver>();
        foreach (var item in nodes)
        {
            brContentSaver cL = new brContentSaver();
            /*  Here comes the junk handler, replaces all junk for nothing, essentially deleting it
                I wish I knew a way to do this efficiently  */
            cL.Weapons = item.SelectSingleNode("tr[16]/td[1]").InnerText
                .Replace("&nbsp;*&nbsp;", " ")
                .Replace("&nbsp ; *&nbsp ;", " ");

            cL.Used = item.SelectSingleNode("tr[16]/td[2]/span[1]").InnerText
                .Replace("&nbsp;*&nbsp;", " ")
                .Replace("&nbsp ; *&nbsp ;", " ");

            cL.Survived = item.SelectSingleNode("tr[16]/td[3]").InnerText
                .Replace("&nbsp;*&nbsp;", " ")
                .Replace("&nbsp ; *&nbsp ;", " ");

            if (cL.Survived == "0")
            {
                cL.Casualties = cL.Used;
            } else
            {
                /*  int Casualties = int.Parse(cL.Casualties);
                 *  int Used = int.Parse(cL.Used);
                 *  int Survived = int.Parse(cL.Survived);

                 *  Casualties = Used - Survived;   */

                 cL.Casualties = item.SelectSingleNode("tr[16]/td[4]").InnerText
                 .Replace("&nbsp;*&nbsp;", " ")
                 .Replace("&nbsp ; *&nbsp ;", " ");
            }

            ContentList.Add(cL);
        }

        foreach (var item in ContentList)
        {
            result += item.Weapons + " " + item.Used + " " + item.Survived + " " + item.Casualties + Environment.NewLine;
        }
        brContent.Text = result;

    }

很抱歉,如果这听起来很愚蠢,但我是编程新手,特别是在C#中。

编辑1:我注意到“如果(cL.Survived ==”0“)”,我之前只是测试了一些东西,我忘了改变它,但是嘿,它有效

编辑2:如果您想知道我也在使用它:

public class brContentSaver
{

    public string Weapons
    {
        get;
        set;
    }

    public string Used
    {
        get;
        set;
    }

    public string Survived
    {
        get;
        set;
    }
    public string Casualties
    {
        get;
        set;
    }
}

1 个答案:

答案 0 :(得分:0)

我没有太多时间写这篇文章,但希望如果你还需要它会有所帮助。我发现Linq更方便:

private static void Run()
{
    HtmlDocument brPage = new HtmlWeb().Load("http://us.desert-operations.com/world2/battleReport.php?code=f8d77b1328c8ce09ec398a78505fc465");
    var nodes = brPage.DocumentNode.Descendants("table").Where(_ => _.Attributes["class"] != null && _.Attributes["class"].Value != null && _.Attributes["class"].Value.Contains("battleReport"));
    string result = "";
    List<brContentSaver> ContentList = new List<brContentSaver>();
    foreach (var item in nodes)
    {
        if (item.Descendants("th").Any(_ => _.InnerText.Equals("Weapons")))
        {
            //get all tr nodes except first one (header)
            var trNodes = item.Descendants("tr").Skip(1);
            foreach (var node in trNodes)
            {
                brContentSaver cL = new brContentSaver();
                var tds = node.Descendants("td").ToArray();
                /*  Here comes the junk handler, replaces all junk for nothing, essentially deleting it
                    I wish I knew a way to do this efficiently  */
                cL.Weapons = tds[0].InnerText
                    .Replace("&nbsp;*&nbsp;", " ")
                    .Replace("&nbsp ; *&nbsp ;", " ");

                cL.Used = tds[1].Descendants("span").FirstOrDefault()?.InnerText
                    .Replace("&nbsp;*&nbsp;", " ")
                    .Replace("&nbsp ; *&nbsp ;", " ");
                if (string.IsNullOrEmpty(cL.Used))
                {
                    cL.Used = tds[1].InnerText;
                }

                cL.Survived = tds[2].Descendants("span").FirstOrDefault()?.InnerText
                    .Replace("&nbsp;*&nbsp;", " ")
                    .Replace("&nbsp ; *&nbsp ;", " ");

                if (string.IsNullOrEmpty(cL.Survived))
                {
                    cL.Casualties = cL.Used;
                }
                else
                {
                    /*  int Casualties = int.Parse(cL.Casualties);
                     *  int Used = int.Parse(cL.Used);
                     *  int Survived = int.Parse(cL.Survived);

                     *  Casualties = Used - Survived;   */

                    cL.Casualties = tds[3].Descendants("span").FirstOrDefault()?.InnerText
                    .Replace("&nbsp;*&nbsp;", " ")
                    .Replace("&nbsp ; *&nbsp ;", " ");

                    if (string.IsNullOrEmpty(cL.Casualties))
                    {
                        cL.Casualties = tds[3].InnerText;
                    }
                }

                ContentList.Add(cL);
            }
        }
    }

    foreach (var item in ContentList)
    {
        result += item.Weapons + " " + item.Used + " " + item.Survived + " " + item.Casualties + Environment.NewLine;
    }
    var text = result;

}