Question

我正试图从这个页面http://stats.rleague.com/rl/seas/2014.html获取一个团队和分数列表，这只是一个学习练习。

我的导入和页面没有得到预期的结果。

In [1]: from lxml import html

In [2]: import requests

In [3]: page = requests.get('http://stats.rleague.com/rl/seas/2014.html')

In [4]: tree = html.fromstring(page.text)

这是标题的html。

<html><title>Rugby League Tables / Season 2014</title>

和团队

<tr><td width=20%><a href="../teams/souths/souths_idx.html">Souths</a></td><td width=12%>4t 6g </td><td width=5%> 28</td><td><b>Date:</b>Thu 06-Mar-2014 <b>Venue:</b><a href="../venues/stadium_australia.html">Stadium Australia</a> <b>Crowd:</b>27,282</td></tr>
<tr><td width=20%><a href="../teams/easts/easts_idx.html">Sydney Roosters</a></td><td width=12%>1t 2g </td><td width=5%> 8</td><td><b>Souths</b> won by <b> 20 pts</b>

但是我得到空白名单，我做错了什么？

In [6]: print(tree)
<Element html at 0x7f518067fc78>

In [7]: titles = tree.xpath('//html[@title]/text()')

In [8]: print(titles)
[]

In [11]: teams = tree.xpath('//tr/td[@href]/text()')

In [12]: print(teams)

[]

Answer 1

更改XPath表达式将为您提供所需的结果：

# `title` is not an attribute, but a tag.
titles = tree.xpath('.//title/text()')
print(titles)

# `td` does not have `href` attribute, but `a` tag.
teams = tree.xpath('//tr/td/a[@href]/text()')
print(teams)

没有Xpath的结果

1 个答案: