我正在尝试为机场名称搜索数据。 我使用以下代码,但我只获得2行而不是其中的数据
import requests
from bs4 import BeautifulSoup
url = 'http://www.airlineupdate.com/content_public/codes/airportcodes/airports-by-iata/iata-a.htm'
page_html = requests.get(url)
page_text = page_html.text
soup = BeautifulSoup(page_text, "html.parser")
table = soup.find('table', {'class': 'sortable'})
for tr in table.findAll('tr'):
for tb in tr.findAll('tb'):
print(tb.text)
答案 0 :(得分:0)
你可以尝试这样的事情并从这段代码中提取
import requests
from bs4 import BeautifulSoup
import time
url = 'http://www.airlineupdate.com/content_public/codes/airportcodes/airports-by-iata/iata-a.htm'
page_html = requests.get(url)
page_text = page_html.text
soup = BeautifulSoup(page_text, "html.parser")
table = soup.find_all('td')
list_1=[]
for i in table:
list_1.append(i.text)
for chunk in range(0,len(list_1[10:]),5):
chunks=list_1[10:][chunk:chunk+5]
print("IATA code : {}".format(chunks[0]))
print("ICAO code : {}".format(chunks[1]))
print("Airport : {}".format(chunks[2]))
print("City : {}".format(chunks[3]))
print("Country : {}".format(chunks[4]))
print("---------------------------")
输出:
IATA code : AAA
ICAO code : NTGA
Airport : Anaa Airport
City : Anaa
Country : French Polynesia
---------------------------
IATA code : AAB
ICAO code : YARY
Airport : Arrabury Airport
City : Arrabury
Country : Australia
---------------------------
IATA code : AAC
ICAO code : HEAR
Airport : Al Arish Airport
City : Al Arish
Country : Egypt
---------------------------
IATA code : AAD
ICAO code : -
Airport : Ad-Dabbah Airport
City : Ad-Dabbah
Country : Sudan
---------------------------
.....
答案 1 :(得分:0)
我不确定这是不是你想要的。运行以下脚本,您将从该表中获取数据。
import requests
from bs4 import BeautifulSoup
url = 'http://www.airlineupdate.com/content_public/codes/airportcodes/airports-by-iata/iata-a.htm'
page_html = requests.get(url)
soup = BeautifulSoup(page_html.text,"lxml")
table = soup.find('table',class_='sortable')
for tr in table.find_all('tr'):
data = ' '.join([item.text.strip() for item in tr.find_all('td')])
print(data)
答案 2 :(得分:0)
在此代码中,仅显示城市。 如果你想要其他参数,只需删除评论
import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
"""res=requests.get("http://www.airlineupdate.com/
content_public/codes/airportcodes/airports-by-iata/iata-a.htm")"""
soup = BeautifulSoup(res.content,'lxml')
df = pd.read_html(str(table))[0]
# IATA_code=df[0]
# ICAO_code=df[1]
# airport=df[2]
city=df[3]
# country=df[4]
print(list(city[1:]))
# print(list(country[1:]))
# print(list(airport[1:]))
# print(list(ICAO_code[1:]))
# print(list(IATA_code[1:]))