Question

我正在尝试从下面的页面中抓取列名（玩家，费用，选择，表单，分数）：

https://fantasy.premierleague.com/a/statistics/total_points

但是，我没有这样做。在继续之前，让我向您展示我的工作。

from lxml import html
import requests


page = 'https://fantasy.premierleague.com/a/statistics/total_points'
#Take site and structure html
page = requests.get(page)
tree = html.fromstring(page.content)

#Using the page's CSS classes, extract all links pointing to a team
Location = tree.cssselect('.ism-thead-bold tr .ism-table--el-stats__name')

当我这样做时，Location应该是一个包含字符串“ Player”的列表。但是，它返回一个空列表，这意味着cssselect没有捕获任何内容。

尽管每个列名都有一个不同的“第一个类”，但为简单起见，我在其中使用了其中一个（ism-table--el-stats__name）。

解决此问题后，我想使用正则表达式，因为每个类在两个下划线之后都有不同的后缀。

如果有人可以帮助我完成这两项任务，我将不胜感激！

谢谢你们。

使用lxml进行Python Web抓取

0 个答案: