当我运行以下代码时,我得到了超出范围消息的列表索引:
import requests
from lxml.html import fromstring
def get_values():
print('executing get_values...')
url = 'https://sports.yahoo.com/nba/stats/weekly/?sortStatId=POINTS_PER_GAME&selectedTable=0'
response = requests.get(url)
parser = fromstring(response.text)
for i in parser.xpath('//tbody/tr')[:100]:
**FGM = i.xpath('.//td[4]/span/text()')[0] #This runs with no error even though its has similar xpath.**
print('FGM: ' + FGM)
G = i.xpath('.//td[2]/span/text()')[0]
print(G)
values = get_values()
运行代码时,出现以下错误消息:
G=i.xpath('/./td[2]/span/text()')[0]
IndexError: list index out of range
我尝试使用以下语句进行调试。
print(parser.xpath('//tbody/tr/td[2]/span/text()')) #Returns list['4', '4', '3', '3', '3', '4', '4', '3', '2', '4', '3']
print(parser.xpath('//tbody/tr/td[2]/span/text()')[0]) #Returns value = 4
print(len(parser.xpath('//tbody/tr/td[2]/span/text()')[0])) # Returns value = 1
输出显示了预期值,因此我不确定其不起作用的原因。任何帮助将不胜感激!
答案 0 :(得分:1)
失败是因为第二个<span>
中并不总是有<td>
。这应该起作用:
def get_values():
print('executing get_values...')
url = 'https://sports.yahoo.com/nba/stats/weekly/?sortStatId=POINTS_PER_GAME&selectedTable=0'
response = requests.get(url)
parser = fromstring(response.text)
for i in parser.xpath('//tbody/tr')[:100]:
FGM = i.xpath('.//td[4]/span/text()')[0] #This runs with no error even though its has similar xpath.**
print('FGM: ' + FGM)
G = i.xpath('.//td[2]/text()|.//td[2]/span/text()')[0] # <--- Changed this
print(G)
values = get_values()
答案 1 :(得分:1)
选择满足查询//foo/bar/qux
的项的选择器与编写查询//foo
然后对其进行迭代,然后期望所有这些元素都具有./bar/qux
的选择器不同。可能有很多<foo>
没有<bar>
或<qux>
。
例如,在源代码中,我们看到一个<tr>
:
<tr class="Bgc(secondary-enhanced):h" data-reactid="1522">
<th class="Px(cell-padding-x) Py(cell-padding-y) Bd...>
因此<tr>
不包含任何<td>
,而是<th>
(用于标题行)。
def get_values():
print('executing get_values...')
url = 'https://sports.yahoo.com/nba/stats/weekly/?sortStatId=POINTS_PER_GAME&selectedTable=0'
response = requests.get(url)
parser = fromstring(response.text)
for i in parser.xpath('//tbody/tr[td[4]/span and td[2]/span]')[:100]:
FGM = i.xpath('.//td[4]/span/text()')[0] #This runs with no error even though its has similar xpath.
print('FGM: ' + FGM)
G = i.xpath('.//td[2]/span/text()')[0]
print(G)
在这里,最后两行不包括在结果中,因为它们没有包装在<span>
标记中,因此您将需要做一些额外的查询来选择正确的行并提取正确的值。