虽然我使用lxml
解析器,但似乎Beautiful Soup缺少很多东西。假设我想从以下网站获得美丽汤中的401(足球)球员名字:
from bs4 import BeautifulSoup
import requests
url = 'https://www.dreamteamfc.com/statistics/form-guide/all'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
name_list = []
for td in soup.findAll("td", { "class" : "tabName" }):
name = td.text.strip()
if name:
name_list.append(name)
print(len(name_list))
print(name_list[-1])
输出:
88
Santon, Davide
现在,使用lxml
库同样的事情:
from lxml import html
import requests
url = 'https://www.dreamteamfc.com/statistics/form-guide/all'
r = requests.get(url)
tree = html.fromstring(r.text)
name_list = []
names = tree.xpath('//td[@class="tabName"]/text()')
for name in names:
name = name.strip()
if name:
name_list.append(name)
print(len(name_list))
print(name_list[-1])
输出:
401
Santon, Davide
一般来说,我喜欢Beautiful Soup,有没有什么好方法让这个工作正常?