Webscraper c#。也许比这更精确的网络刮板

时间:2017-07-20 23:21:50

标签: c#

我正在试图抓住http://gameinfo.na.leagueoflegends.com/en/game-info/champions/,但我无法在网页搜索中找到那些冠军的图像。问题是它没有刮掉所有东西......我的脚本是......

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 
using System.Threading.Tasks; 
using System.IO; 
using System.Net;

namespace WebScraping 
{ 
   class Program
   { 
      static void Main(string[] args) { 
      WebScraping wb = new WebScraping(); 
      wb.Scraping(); 
   }
   class WebScraping
   {
      public void Scraping()
      {
          Console.WriteLine("Type in the webpage you want to scrape : \n");
          string WebPage = Console.ReadLine();
          WebClient webc = new WebClient();
          string url = webc.DownloadString(WebPage);`

          Console.WriteLine(url += "\n \t Done");
          Console.ReadLine();
      }
    }
  }

我试图找到的是<a href="amumu"/></a>

2 个答案:

答案 0 :(得分:3)

你是对的:数据不在原始HTML中。相反,Champions Grid通过javascript填充。这实际上对你有利;这意味着你可能能够以json格式获取你的英雄信息,这更容易解析。唯一的技巧是找到加载javascript的地方。

为此,请在浏览器中加载页面并使用开发人员工具。我将以谷歌浏览器为例。点击F12打开开发人员工具,然后转到Network标签。现在点击Shift+F5重新加载页面记录请求。完成此操作后,您可以查看下载的每个项目以呈现此页面。我看到了完整的238个请求(这很多!)但是如果你浏览列表中的json项目,你最终会看到一个champions.json文件。右键单击它,你就可以得到这个网址:

  

http://ddragon.leagueoflegends.com/cdn/6.24.1/data/en_US/champion.json

查看该文件中的数据,您会发现:

"Amumu":
{
   "version":"6.24.1",
    "id":"Amumu",
    "key":"32",
    "name":"Amumu",
    "title":"the Sad Mummy",
    "blurb":"''Solitude can be lonelier than death.''<br><br>A lonely and melancholy soul from ancient Shurima, Amumu roams the world in search of a friend. Cursed by an ancient spell, he is doomed to remain alone forever, as his touch is death and his affection ...",
    "info":
    {
        "attack":2,
        "defense":6,
        "magic":8,
        "difficulty":3
    },
    "image":
    {
        "full":"Amumu.png",
        "sprite":"champion0.png",
        "group":"champion",
        "x":192,
        "y":0,
        "w":48,
        "h":48
    },
    "tags":["Tank","Mage"],
    "partype":"MP",
    "stats":
    {
        "hp":613.12,
        "hpperlevel":84.0,
        "mp":287.2,
        "mpperlevel":40.0,
        "movespeed":335.0,
        "armor":23.544,
        "armorperlevel":3.8,
        "spellblock":32.1,
        "spellblockperlevel":1.25,
        "attackrange":125.0,
        "hpregen":8.875,
        "hpregenperlevel":0.85,
        "mpregen":7.38,
        "mpregenperlevel":0.525,
        "crit":0.0,
        "critperlevel":0.0,
        "attackdamage":53.384,
        "attackdamageperlevel":3.8,
        "attackspeedoffset":-0.02,
        "attackspeedperlevel":2.18
    }
}

使用NuGet引入JSON解析器,您可以快速从中获取结构化数据。

答案 1 :(得分:0)

Regex帮助我匹配我需要的信息

 MatchCollection m1 = Regex.Matches(html, "\"id\":\"(.+?)\",\"", RegexOptions.Singleline);