我正在试图抓住http://gameinfo.na.leagueoflegends.com/en/game-info/champions/,但我无法在网页搜索中找到那些冠军的图像。问题是它没有刮掉所有东西......我的脚本是......
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Net;
namespace WebScraping
{
class Program
{
static void Main(string[] args) {
WebScraping wb = new WebScraping();
wb.Scraping();
}
class WebScraping
{
public void Scraping()
{
Console.WriteLine("Type in the webpage you want to scrape : \n");
string WebPage = Console.ReadLine();
WebClient webc = new WebClient();
string url = webc.DownloadString(WebPage);`
Console.WriteLine(url += "\n \t Done");
Console.ReadLine();
}
}
}
我试图找到的是<a href="amumu"/></a>
答案 0 :(得分:3)
你是对的:数据不在原始HTML中。相反,Champions Grid通过javascript填充。这实际上对你有利;这意味着你可能能够以json格式获取你的英雄信息,这更容易解析。唯一的技巧是找到加载javascript的地方。
为此,请在浏览器中加载页面并使用开发人员工具。我将以谷歌浏览器为例。点击F12
打开开发人员工具,然后转到Network
标签。现在点击Shift+F5
重新加载页面记录请求。完成此操作后,您可以查看下载的每个项目以呈现此页面。我看到了完整的238个请求(这很多!)但是如果你浏览列表中的json项目,你最终会看到一个champions.json
文件。右键单击它,你就可以得到这个网址:
http://ddragon.leagueoflegends.com/cdn/6.24.1/data/en_US/champion.json
查看该文件中的数据,您会发现:
"Amumu":
{
"version":"6.24.1",
"id":"Amumu",
"key":"32",
"name":"Amumu",
"title":"the Sad Mummy",
"blurb":"''Solitude can be lonelier than death.''<br><br>A lonely and melancholy soul from ancient Shurima, Amumu roams the world in search of a friend. Cursed by an ancient spell, he is doomed to remain alone forever, as his touch is death and his affection ...",
"info":
{
"attack":2,
"defense":6,
"magic":8,
"difficulty":3
},
"image":
{
"full":"Amumu.png",
"sprite":"champion0.png",
"group":"champion",
"x":192,
"y":0,
"w":48,
"h":48
},
"tags":["Tank","Mage"],
"partype":"MP",
"stats":
{
"hp":613.12,
"hpperlevel":84.0,
"mp":287.2,
"mpperlevel":40.0,
"movespeed":335.0,
"armor":23.544,
"armorperlevel":3.8,
"spellblock":32.1,
"spellblockperlevel":1.25,
"attackrange":125.0,
"hpregen":8.875,
"hpregenperlevel":0.85,
"mpregen":7.38,
"mpregenperlevel":0.525,
"crit":0.0,
"critperlevel":0.0,
"attackdamage":53.384,
"attackdamageperlevel":3.8,
"attackspeedoffset":-0.02,
"attackspeedperlevel":2.18
}
}
使用NuGet引入JSON解析器,您可以快速从中获取结构化数据。
答案 1 :(得分:0)
Regex帮助我匹配我需要的信息
MatchCollection m1 = Regex.Matches(html, "\"id\":\"(.+?)\",\"", RegexOptions.Singleline);