System.Collections.Generic.List`1 [System.String]在Webscraping时

时间:2016-12-04 10:46:04

标签: c# linq html-agility-pack

目前有一个问题我不能让C#将我的列表输出到可读的内容中,这意味着我实际上无法看到webscraping是否真的有效或者提取不正确的信息。

任何人都知道如何将 System.Collections.Generic.List1` [System.String] 更改为可读的内容?

using HtmlAgilityPack;
using NScrape.Forms;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading.Tasks;

namespace FulcrumBotManager
{
    class Program
    {
        static void Main(string[] args)
        {

            WebClient webClient = new WebClient();
            string download = webClient.DownloadString("http://localhost:1013");

            HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();
            html.LoadHtml(download);
            List<List<string>> table = html.DocumentNode.SelectSingleNode("//table")
                        .Descendants("tr")
                        .Skip(1)
                        .Where(tr => tr.Elements("td").Count() > 1)
                        .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
                        .ToList();


           table.ForEach(Console.WriteLine);
        }
    }
}

HTML正在被抓取

&#13;
&#13;
<!DOCTYPE html>
<!-- saved from url=(0022)http://localhost:1013/ -->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<meta http-equiv="refresh" content="5">
<style>#accounts {font-family: "Trebuchet MS", Arial, Helvetica, sans-serif;border-collapse: collapse;margin: 0px auto;}#accounts td, #customers th {font-size: 1em;border: 1px solid #98bf21;padding: 3px 7px 2px 7px;}#accounts th {font-size: 1.1em;text-align: left;padding-top: 5px;padding-bottom: 4px;background-color: #A7C942;color: #ffffff;}#accounts tr.alt td {color: #000000;background-color: #EAF2D3;}</style>
</head>
<body>
<table id="accounts">
<tbody>
<tr>
<th>Run</th>
<th>Region</th>
<th>Username</th>
<th>Max IP</th>
<th>Game</th>
<th>Spell 1</th>
<th>Spell 2</th>
<th>Summoner</th>
<th>Lvl</th>
<th>Total IP</th>
<th>Total RP</th>
<th>Status</th>
</tr>
<tr>
<td>False</td>
<td>EUW</td>
<td>sage</td>
<td>0</td>
<td>ARAM</td>
<td>Barrier</td>
<td>Heal</td>
<td>Summoner</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>10:26:45 Disconnected</td>
</tr><tr class="alt">
<td>False</td>
<td>EUW</td>
<td>wily</td>
<td>0</td>
<td>ARAM</td>
<td>Barrier</td>
<td>Heal</td>
<td>Summoner</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>10:26:45 Disconnected</td>
</tr><tr>
<td>False</td>
<td>EUW</td>
<td>miles</td>
<td>0</td>
<td>ARAM</td>
<td>Barrier</td>
<td>Heal</td>
<td>Summoner</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>10:26:46 Disconnected</td>
</tr><tr class="alt">
<td>False</td>
<td>EUW</td>
<td>cookie</td>
<td>0</td>
<td>ARAM</td>
<td>Barrier</td>
<td>Heal</td>
<td>Summoner</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>10:26:47 Disconnected</td>
</tr><tr>
<td>False</td>
<td>EUW</td>
<td>lazors</td>
<td>0</td>
<td>ARAM</td>
<td>Barrier</td>
<td>Heal</td>
<td>Summoner</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>10:26:48 Disconnected</td>
</tr></tbody>
</table>
<center>Updated core files | 00:50:56:C0:00:01 00:50:56:C0:00:08 00:21:CC:73:5B:BF 08:11:96:F7:A7:0C 00:FF:8B:11:85:F4 <br>Refreshes every 5 seconds</center>
</body>
</html>
&#13;
&#13;
&#13;

2 个答案:

答案 0 :(得分:2)

table.ForEach(x => x.ForEach(Console.WriteLine));

马丁的解释是正确的,只是除了它:你可以用LINQ这样做。

答案 1 :(得分:1)

基本上你拥有的是string :-)列表。这意味着它是&#34;两级层次结构&#34;。

在当前状态中,您只是枚举和编写每个内部列表本身。由于Console.WriteLine不熟悉Lists,因此只需在实例上调用ToString(),即输出类型名称。

你真正想要的是枚举内部列表:

//enumerate all lists in the outer list
foreach ( var list in table )
{
   //enumerate the inner list
   foreach ( var item in list )
   {
        //output the actual item
        Console.WriteLine( item );
   }
}