我正在尝试从某个网站上抓取数据,但到目前为止还没有成功。我尝试了几种方法 最有希望的就是这个。我正在尝试从网站获取yearBuild。有人可以帮我吗。任何线索都将受到高度赞赏
import bs4 as bs
from selenium import webdriver
wd = webdriver.Chrome()
url = ("https://www.marinetraffic.com/en/ais/details/ships/mmsi:255805792")
wd.get(url)
html_source = wd.page_source
wd.quit()
soup = bs.BeautifulSoup(html_source)
elems = soup.select('#yearBuild > b')
print(elems)
print(soup.prettify())
此处elems作为空列表返回
答案 0 :(得分:1)
您可以使用他们的API获取有关飞船的信息。
例如:
import re
import json
import requests
url = 'https://www.marinetraffic.com/en/ais/details/ships/mmsi:255805792'
ship_info_url = 'https://www.marinetraffic.com/en/vesselDetails/vesselInfo/shipid:{ship_id}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
r = requests.get(url, headers=headers)
ship_id = re.search(r'shipid:(\d+)', r.url)[1]
data = requests.get(ship_info_url.format(ship_id=ship_id), headers=headers).json()
print(json.dumps(data, indent=4))
print('Year Built = ', data['yearBuilt'])
打印:
{
"name": "LAILA",
"nameAis": "LAILA",
"imo": 9377559,
"eni": null,
"mmsi": 255805792,
"callsign": "CQDP",
"country": "Portugal",
"countryCode": "PT",
"type": "Cargo - Hazard A (Major)",
"typeSpecific": "Container Ship",
"typeColor": "7",
"grossTonnage": 28048,
"deadweight": 38080,
"teu": 2700,
"liquidGas": null,
"length": 215.5,
"breadth": 29.87,
"yearBuilt": 2008,
"status": "Active",
"isNavigationalAid": false,
"correspondingRoamingStationId": null,
"homePort": null
}
Year Built = 2008
答案 1 :(得分:0)
我可以建议使用VesselFinder而不是MarineTraffic吗?数据是相同的,但是MarineTraffic很难抓取,因为它全部是JavaScript,而VesselFinder可以只使用BeautifulSoup抓取。
VesselFinder还使用表格来显示数据,因此很容易用熊猫进行解析。
这是代码:
import pandas as pd
import requests
r = requests.get('https://www.vesselfinder.com/vessels/LAILA-IMO-9377559-MMSI-255805792', headers={'User-Agent': 'iPhone'})
df = pd.read_html(r.text)
ship = ship = pd.concat([df[2], df[3]], ignore_index=True).set_index(0).to_dict()[1]
print(ship['Year of Built'])