为了使BeautifulSoup正确抓取,是否有任何具体说明?

时间:2020-01-31 03:31:22

标签: web-scraping beautifulsoup

我试图从Wikipedia上刮掉桌子。我试图通过('div',class_ ='mw-parser-output'),它返回了一个文本。但是,为什么表标记返回一个空列表?请解释。谢谢。enter image description here

1 个答案:

答案 0 :(得分:0)

要从Wiki页面上抓取第二张表,可以使用以下示例:

import requests
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/Makati'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

second_table = soup.select('.wikitable')[1]
for tr in second_table.select('tr'):
    print('{:<25} {:<25} {:<25} {:<25} {:<25}'.format(*[t.get_text(strip=True) for t in tr.select('th, td')]))

打印:

Barangay                  Population (2004)         Population (2010)[51]     Area (km2)                District                 
Bangkal                   22,433                    23,378                    0.74                      1st                      
Bel-Air                   9,330                     18,280                    1.71                      1st                      
Carmona                   3,699                     3,096                     0.34                      1st                      
Cembo                     25,815                    27,998                    0.22                      2nd                      
Comembo                   14,174                    14,433                    0.27                      2nd                      
Dasmariñas                5,757                     5,654                     1.90                      1st                      
East Rembo                23,902                    26,433                    0.44                      2nd                      
Forbes Park               3,420                     2,533                     2.53                      1st                      
Guadalupe Nuevo           22,493                    18,271                    0.57                      2nd                      
Guadalupe Viejo           13,632                    16,411                    0.62                      2nd                      

... and so on.