我想从URL中提取一个表,但是迷路了......看看我在下面做了什么:
url = "https://www.marinetraffic.com/en/ais/index/ports/all/per_page:50"
headers = {'User-agent': 'Mozilla/5.0'}
raw_html = requests.get(url, headers=headers)
raw_data = raw_html.text
soup_data = BeautifulSoup(raw_data, "lxml")
td = soup_data.findAll('tr')[1:]
country = []
for data in td:
col = data.find_all('td')
country.append(col)
如何获取某些列的文本和URL(国家/地区,端口名称,UN / LOCODE,类型和端口地图)?
答案 0 :(得分:1)
我为你做了一些刮刮。您可以使用具有键值的字典作为表标题,如下所示。您可以遍历各个td以获取所需的列,然后使用.text
获取文本的url,src,href等和url = "https://www.marinetraffic.com/en/ais/index/ports/all/per_page:50"
headers = {'User-agent': 'Mozilla/5.0'}
raw_html = requests.get(url, headers=headers)
raw_data = raw_html.text
soup_data = BeautifulSoup(raw_data, "lxml")
td = soup_data.findAll('tr')[1:]
country = []
for data in td:
col = data.find_all('td')
details = {}
for i,col in enumerate(col):
if i == 0:
details['Img-src'] = ("https://www.marinetraffic.com"+col.find('img')['src'])
if i == 1:
details["Port_name"] = (col.text.replace('\n',''))
if i == 2:
details['UN/LOCODE'] = (col.text.replace('\r\n','').replace(" ",""))
if i == 4:
details['type'] = (col.text.replace('\r\n','').replace(" ",""))
if i == 5:
details['map_url'] = ("https://www.marinetraffic.com"+(col.find('a')['href']))
country.append(details)
。希望这可以帮助。
<div id="ember3366" class="ember-view">
<div class="row m-b-1">
<!---->
<div class="col-xs-12 col-md-6 col-lg-3 m-b-1">
<label>Category</label>
<select class="form-control">
<option value="All">
<option value="Spirits">Spirits</option>
<option value="Wine">Wine</option>
</select>
</div>
输出:
[{'Img-src': 'https://www.marinetraffic.com/img/flags/png40/CN.png', 'Port_name': 'SHANGHAI', 'UN/LOCODE': 'CNSHA', 'map_url': 'https://www.marinetraffic.com/en/ais/home/zoom:9/centerx:121.614746/centery:31.3663635/showports:true/portid:1253', 'type': 'Port'}, {'Img-src': 'https://www.marinetraffic.com/img/flags/png40/CN.png', 'Port_name': 'MAANSHAN', 'UN/LOCODE': 'CNMAA', 'map_url': 'https://www.marinetraffic.com/en/ais/home/zoom:14/centerx:118.459503/centery:31.7180004/showports:true/portid:2746', 'type': 'Port'}, {'Img-src': 'https://www.marinetraffic.com/img/flags/png40/HK.png', 'Port_name': 'HONG KONG', 'UN/LOCODE': 'HKHKG', 'map_url': 'https://www.marinetraffic.com/en/ais/home/zoom:14/centerx:114.181366/centery:22.2879486/showports:true/portid:2429', 'type': 'Port'}, ... ]