我是 Web 开发的新手,我不知道如何从同一个网站的 4 个 url 中检索内容,我总是得到空值。我正在使用颤振和包 web_scraper: ^0.0.8
我需要从站点检索标题、图片、描述和网址,我将导航的页面是:
https://datassette.org/revistas(类别)
https://datassette.org/revistas/videogames(选择杂志语言)
https://datassette.org/revistas/br-brasil(杂志(标题、图片和网址))
https://datassette.org/revistas/acao-games/semana-em-acao-especial-games-no-1(杂志标题、描述、图片、pdf 格式的网址)。
.getElement 方法是什么?
/// Returns List of elements found at specified address.
/// Example address: "div.item > a.title" where item and title are class names of div and a tag respectively.
List<Map<String, dynamic>> getElement(String address, List<String> attribs) {
// Attribs are the list of attributes required to extract from the html tag(s) ex. ['href', 'title'].
import 'package:web_scraper/web_scraper.dart';
class WebScraperHelper {
static final webScraper = WebScraper('https://datassette.org');
static Future<void> getData() async{
if (await webScraper.loadWebPage('/revistas')) {
// it prints the full html
//print("getPageContent: ${webScraper.getPageContent()}");
List<Map<String, dynamic>> images = webScraper.getElement(
'img.width-full.wt-height-full.display-block.position-absolute',
['src']);
List<Map<String, dynamic>> descriptions = webScraper.getElement(
'h3.text-gray.text-truncate.mb-xs-0.text-body', ['title']);
List<Map<String, dynamic>> urls = webScraper.getElement(
'div > ul > li > div > a',
['href', 'title']);
print("images: $images"); // print []
print("descriptions: $descriptions"); // print []
print("urls: $urls"); // print []
}
}
}
答案 0 :(得分:0)
我不使用那个包,我更习惯使用正则表达式,这里有一个例子:
import 'dart:async';
import 'package:_samples2/networking.dart';
// get Categories
const kUrlRevistas = 'https://datassette.org/revistas';
var regExp1 = RegExp(r'<a href="\/revistas\/\p{L}+">(\p{L}+)<\/a>', unicode: true);
class Revistas {
static Future fetchRevistas () async {
print('Start fetching...');
return await NetService.getRaw(kUrlRevistas)
.whenComplete(() => print('Fetching done!'));
}
}
void main(List<String> args) async {
var data = await Revistas.fetchRevistas();
var matches = regExp1.allMatches(data);
print(matches.map((e) => e.group(1)).toList());
}
结果:
Start fetching...
Fetching done!
[Diversas, Eletrônica, Informática, Videogames]
P.S.:您需要阅读 HTML 代码。
答案 1 :(得分:0)
经过数小时的测试,我找到了一种方法来检索我需要的所有数据。
static Future<void> getMagazines() async {
if (await webScraper.loadWebPage('/revistas/acao-games')) {
List<Map<String, dynamic>> maps = [];
List<Map<String, dynamic>> titles = webScraper.getElement(
'span.field-content > a',
[]
);
// adicionar somente os mapas que tiverem dado no atributo title
List<Map<String, dynamic>> urls = webScraper.getElement(
'div.field-content > a',
[ 'href']
);
List<Map<String, dynamic>> images = webScraper.getElement(
'div.field-content > a > img',
[ 'src']
);
for(int i=0; i < urls.length; i++){
maps.add({
"title": titles[i]["title"],
"url" : urls[i]["attributes"]["href"],
"image" : images[i]["attributes"]["src"],
});
}
//print("TITLES: $titles");
//print("URLS: $urls");
//print("IMAGES: $images");
print(maps.length);
}