我正在尝试通过网络抓取从以下网址中获取特定的数字:“ https://www.ulb.uni-muenster.de/”。该数字是动态的。不幸的是,当我搜索号码时,我只会得到课程,而没有号码。当我在Chrome浏览器中检查网址时,可以在源代码中清楚地看到该数字。我有两种方法:
import seaborn as sns
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = 'https://www.ulb.uni-muenster.de/'
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')
tags = soup.find('span', {'class': 'seatingsCounter'})
print(tags)
退出:<span class="seatingsCounter"></span>
import requests
r = requests.get('https://www.ulb.uni-muenster.de/')
data = BeautifulSoup(r.content)
examples = []
for d in data.findAll('a'):
examples.append(d)
my_as = soup.findAll("span", { "class" : "seatingsCounter" })
退出:[<span class="seatingsCounter"></span>]
这两个都不起作用,因为输出始终只是类。
答案 0 :(得分:1)
如果您查看页面源代码,您会发现空闲位置的数量已由JavaScript函数showMessage
更新:
var showMessage = function(data) {
var locations = [ "ZB_LS", "ZB_RS" ];
var free = 0;
var total = 0;
var open = true;
$('.availableSeatings .spinner').remove();
$('.availableSeatings .error').data('counter', 0);
$.each(data.locations, function( key, value ) {
if ($.inArray( value.id, locations) !== -1)
{
free = free + Math.round((100 - value.quota) * value.places/100);
total = total + value.places;
open = open && value.open;
}
});
if (open)
{
$('.availableSeatings .message').show().siblings().hide();
quota = Math.round(free/total * 100);
result = free + '<span class="quota">(' + quota + '%)</span>';
date = $.format.date(data.datetime, "dd.MM.yyyy, HH:mm");
$('.availableSeatings .seatingsCounter').html(result); // <- HERE!!
$('.availableSeatings .updated .datetime').text(date);
$('.availableSeatings .updated').show();
} else {
$('.availableSeatings .closed').show().siblings().hide();
}
};
在源代码的下方,您将看到以下行:
$.ajax({
dataType: "json",
url: "/available-seatings.json", \\ <-- THIS LOOKS INTERESTING
timeout: 40000,
success: function(data) { showMessage(data); },
error: function() {
counter = $('.availableSeatings .error').data('counter');
if (isNaN(counter) || counter >= 3)
{
showError();
} else {
$('.availableSeatings .error').data('counter', counter + 1);
}
},
complete: function() {
setTimeout(worker, 60000);
}
});
如果我们转到https://www.ulb.uni-muenster.de/available-seatings.json,则会看到类似以下内容的
:{"datetime":"2019-11-13 13:49:46","locations":[{"id":"ZB_LS","label":"Zentralbibliothek Lesesaal","open":true,"quota":99,"places":678},{"id":"ZB_RS","label":"Zentralbibliothek Recherchesaal","open":true,"quota":94,"places":154},{"id":"VSTH","label":"Bibliothek im Vom-Stein-Haus","open":true,"quota":56,"places":145},{"id":"RWS1","label":"Bibliothek im Rechtswissenschaftlichen Seminar I \/ Einzelarbeitszone","open":true,"quota":98,"places":352},{"id":"RWS1_G","label":"Bibliothek im Rechtswissenschaftlichen Seminar I \/ Gruppenarbeitszone","open":true,"quota":30,"places":40},{"id":"RWS2","label":"Bibliothek im Rechtswissenschaftlichen Seminar II","open":true,"quota":54,"places":162},{"id":"WIWI","label":"Fachbereichsbibliothek Wirtschaftswissenschaften \/ Einzelarbeitszone","open":true,"quota":71,"places":132},{"id":"WIWI_G","label":"Fachbereichsbibliothek Wirtschaftswissenschaften \/ Gruppenarbeitszone","open":true,"quota":98,"places":45},{"id":"ZBSOZ","label":"Zweigbibliothek Sozialwissenschaften","open":true,"quota":74,"places":129},{"id":"FHAUS","label":"Gemeinschaftsbibliothek im F\u00fcrstenberghaus","open":true,"quota":68,"places":197},{"id":"IFE","label":"Bibliothek des Instituts f\u00fcr Erziehungswissenschaft","open":true,"quota":47,"places":183},{"id":"PHI","label":"Bibliotheken im Philosophikum (Domplatz 23)","open":true,"quota":68,"places":98}]}
Voila,添加Python JSON模块可能比使用Selenium重写要容易得多,尽管这样做也可以。