我试图从Wikipedia上刮掉桌子。我试图通过('div',class_ ='mw-parser-output'),它返回了一个文本。但是,为什么表标记返回一个空列表?请解释。谢谢。enter image description here
答案 0 :(得分:0)
要从Wiki页面上抓取第二张表,可以使用以下示例:
import requests
from bs4 import BeautifulSoup
url = 'https://en.wikipedia.org/wiki/Makati'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
second_table = soup.select('.wikitable')[1]
for tr in second_table.select('tr'):
print('{:<25} {:<25} {:<25} {:<25} {:<25}'.format(*[t.get_text(strip=True) for t in tr.select('th, td')]))
打印:
Barangay Population (2004) Population (2010)[51] Area (km2) District
Bangkal 22,433 23,378 0.74 1st
Bel-Air 9,330 18,280 1.71 1st
Carmona 3,699 3,096 0.34 1st
Cembo 25,815 27,998 0.22 2nd
Comembo 14,174 14,433 0.27 2nd
Dasmariñas 5,757 5,654 1.90 1st
East Rembo 23,902 26,433 0.44 2nd
Forbes Park 3,420 2,533 2.53 1st
Guadalupe Nuevo 22,493 18,271 0.57 2nd
Guadalupe Viejo 13,632 16,411 0.62 2nd
... and so on.