我正在尝试打印本网站底部的链接列表中的日期。我不知道出了什么问题,因为没有错误闪现。我尝试过更简单的方法,适用于纽约时代的网站,以检索所有的href。但是这些没有用,所以我调查了用户代理。
import urllib
import lxml.html
import urllib2
from urllib import URLopener
URLopener.version
from urllib import FancyURLopener
class MyOpener(FancyURLopener):
version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
MyOpener.version
myopener = MyOpener()
page = myopener.open('https://flight-data.adsbexchange.com/activity?inputSelect=registration®istration=N12345')
page.read()
from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "lxml")
for line in soup.find_all('a'):
print(line.get('href'))
答案 0 :(得分:0)
执行以下脚本。它会为您提供所有想要的链接:
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import requests
page_url = "https://flight-data.adsbexchange.com/activity?inputSelect=registration®istration=N12345"
page = requests.get(page_url).text
soup = BeautifulSoup(page, "lxml")
for items in soup.select(".dates"):
print(urljoin(page_url,items['href']))
部分输出:
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-11-14
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-11-09
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-11-08
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-11-05
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-10-31