美丽的汤与lxml:美丽的汤不是很健壮或我做错了吗?

时间:2014-12-20 02:01:30

标签: python html beautifulsoup lxml

虽然我使用lxml解析器,但似乎Beautiful Soup缺少很多东西。假设我想从以下网站获得美丽汤中的401(足球)球员名字:

from bs4 import BeautifulSoup
import requests

url = 'https://www.dreamteamfc.com/statistics/form-guide/all'
r  = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')

name_list = []

for td in soup.findAll("td", { "class" : "tabName" }):
    name = td.text.strip()
    if name:
        name_list.append(name)

print(len(name_list))
print(name_list[-1])

输出:

88
Santon, Davide

现在,使用lxml库同样的事情:

from lxml import html
import requests

url = 'https://www.dreamteamfc.com/statistics/form-guide/all'

r = requests.get(url)
tree = html.fromstring(r.text)

name_list = []

names = tree.xpath('//td[@class="tabName"]/text()')
for name in names:
    name = name.strip()
    if name:
        name_list.append(name)

print(len(name_list))
print(name_list[-1])

输出:

401
Santon, Davide

一般来说,我喜欢Beautiful Soup,有没有什么好方法让这个工作正常?

0 个答案:

没有答案