用Python / beautifulsoup刮表

时间:2017-12-19 03:22:53

标签: python web-scraping beautifulsoup

我正在尝试为机场名称搜索数据。 我使用以下代码,但我只获得2行而不是其中的数据

import requests
from bs4 import BeautifulSoup

url = 'http://www.airlineupdate.com/content_public/codes/airportcodes/airports-by-iata/iata-a.htm'
page_html = requests.get(url)
page_text = page_html.text
soup = BeautifulSoup(page_text, "html.parser")
table = soup.find('table', {'class': 'sortable'})
for tr in table.findAll('tr'):
    for tb in tr.findAll('tb'):
        print(tb.text)

3 个答案:

答案 0 :(得分:0)

你可以尝试这样的事情并从这段代码中提取

import requests
from bs4 import BeautifulSoup
import time

url = 'http://www.airlineupdate.com/content_public/codes/airportcodes/airports-by-iata/iata-a.htm'
page_html = requests.get(url)
page_text = page_html.text
soup = BeautifulSoup(page_text, "html.parser")
table = soup.find_all('td')
list_1=[]
for i in table:
    list_1.append(i.text)

for chunk in range(0,len(list_1[10:]),5):
    chunks=list_1[10:][chunk:chunk+5]
    print("IATA code : {}".format(chunks[0]))
    print("ICAO code : {}".format(chunks[1]))
    print("Airport : {}".format(chunks[2]))
    print("City : {}".format(chunks[3]))
    print("Country : {}".format(chunks[4]))

    print("---------------------------")

输出:

IATA code : AAA
ICAO code : NTGA
Airport : Anaa Airport
City : Anaa
Country : French Polynesia
---------------------------
IATA code : AAB
ICAO code : YARY
Airport : Arrabury Airport
City : Arrabury
Country : Australia
---------------------------
IATA code : AAC
ICAO code : HEAR
Airport : Al Arish Airport
City : Al Arish
Country : Egypt
---------------------------
IATA code : AAD
ICAO code : -
Airport : Ad-Dabbah Airport
City : Ad-Dabbah
Country : Sudan
---------------------------
    .....

答案 1 :(得分:0)

我不确定这是不是你想要的。运行以下脚本,您将从该表中获取数据。

import requests
from bs4 import BeautifulSoup

url = 'http://www.airlineupdate.com/content_public/codes/airportcodes/airports-by-iata/iata-a.htm'
page_html = requests.get(url)
soup = BeautifulSoup(page_html.text,"lxml")
table = soup.find('table',class_='sortable')
for tr in table.find_all('tr'):
    data = ' '.join([item.text.strip() for item in tr.find_all('td')])
    print(data)

答案 2 :(得分:0)

在此代码中,仅显示城市。 如果你想要其他参数,只需删除评论

import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate

"""res=requests.get("http://www.airlineupdate.com/
content_public/codes/airportcodes/airports-by-iata/iata-a.htm")"""

soup = BeautifulSoup(res.content,'lxml')

df = pd.read_html(str(table))[0]
# IATA_code=df[0]
# ICAO_code=df[1]
# airport=df[2]
city=df[3]
# country=df[4]

print(list(city[1:]))
# print(list(country[1:]))
# print(list(airport[1:]))
# print(list(ICAO_code[1:]))
# print(list(IATA_code[1:]))