c#HtmlAgilityPack,如何获取特定标签的所有出现的InnerText?

时间:2017-06-22 01:23:38

标签: c# html html-agility-pack innertext selectnodes

正如标题中简要解释的那样,我试图抓住每个标签出现的每个InnerText并将其添加到List中。这是我的代码以及我的html:

HTML的主体:

<body cz-shortcut-listen="true">
{"draw":1,"recordsTotal":9437,"recordsFiltered":9437,"data":[["
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">AK-47 | Aquamarine Revenge (Factory New)&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;"href="\&quot;\/id\/115739257\&quot;">33.87&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">34.53&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;https:\/\/track.steamanalyst.com\/730\/115739257\/all\&quot;">25.9&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">164&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">-0.16&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">2.10945&lt;\/a&gt;"],["</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">AK-47 | Aquamarine Revenge (Minimal Wear)&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">23.44&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">21.85&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;https:\/\/track.steamanalyst.com\/730\/115734122\/all\&quot;">17.61&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">533&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">-2.65&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">0.94387&lt;\/a&gt;"],["</a>
</body>

我的代码:

List<string> Data = new List<string>();
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a[@target]"))
{
    if(j <= 6)
    {
        Data.Add(node.InnerText);
        if (j == 6)
        {
            JsonDB.Add(Data[0], Data[1]);
            Data.Clear();
            j = 0;
        }
        else
        {
            j++;
        }
    }
}

此代码出现问题:node.InnerText显示正文中所有标记的所有InnerTexts的连接字符串!基本上它将此显示为doc.DocumentNode.SelectNodes("//a[@target]")中的第一个节点:

AK-47 | Aquamarine Revenge (Factory New)","33.8","34.34","25.89","170",
"-1.27","2.03181"],[...

2 个答案:

答案 0 :(得分:0)

正文中的所有标签:

doc.DocumentNode.SelectNodes("//a[@target]"))

文档中的标签:

doc.DocumentNode.SelectNodes(".//a[@target]"))

答案 1 :(得分:0)

解决方案:在进入HTML

之前,必须将其视为JSON-Object
JObject jresponse = JObject.Parse(response);
foreach (JArray row in jresponse["data"])
{
    List<string> Data = new List<string>();
    foreach (JToken entry in row)
    {
        doc.LoadHtml(entry.ToString());
        HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[@target]");
        Data.Add(node.InnerText);
    }
}